IOMMU Archive on lore.kernel.org
 help / color / Atom feed
* Crash kernel with 256 MB reserved memory runs into OOM condition
@ 2019-08-12  9:42 Paul Menzel
  2019-08-12  9:50 ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Menzel @ 2019-08-12  9:42 UTC (permalink / raw)
  To: Jörg Rödel
  Cc: linux-pci, x86, kexec, Linux Kernel Mailing List, iommu, Donald Buczek

[-- Attachment #1.1.1: Type: text/plain, Size: 31932 bytes --]

Dear Linux folks,


On a Dell PowerEdge R7425 with two AMD EPYC 7601 (total 128 threads) and
1 TB RAM, the crash kernel with 256 MB of space reserved crashes.

Please find the messages of the normal and the crash kernel attached.

```
[…]
[    4.319253] iommu: Adding device 0000:06:00.2 to group 5
[    4.325869] iommu: Adding device 0000:20:01.0 to group 15
[    4.332648] iommu: Adding device 0000:20:02.0 to group 16
[    4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0
[    4.350251] swapper/0 cpuset=/ mems_allowed=0
[    4.354618] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.57.mx64.282 #1
[    4.355612] Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.9.3 06/25/2019
[    4.355612] Call Trace:
[    4.355612]  dump_stack+0x46/0x5b
[    4.355612]  dump_header+0x6b/0x289
[    4.355612]  ? try_to_free_pages+0xcf/0x1c0
[    4.355612]  out_of_memory+0x470/0x4c0
[    4.355612]  __alloc_pages_nodemask+0x970/0x1030
[    4.355612]  cache_grow_begin+0x7d/0x520
[    4.355612]  fallback_alloc+0x148/0x200
[    4.355612]  kmem_cache_alloc_trace+0xac/0x1f0
[    4.355612]  init_iova_domain+0x112/0x170
[    4.355612]  amd_iommu_domain_alloc+0x138/0x1a0
[    4.355612]  iommu_group_get_for_dev+0xc4/0x1a0
[    4.355612]  amd_iommu_add_device+0x13a/0x610
[    4.355612]  ? iommu_group_alloc+0x180/0x180
[    4.355612]  ? set_debug_rodata+0x11/0x11
[    4.355612]  add_iommu_group+0x20/0x30
[    4.355612]  bus_for_each_dev+0x76/0xc0
[    4.355612]  ? down_write+0xe/0x40
[    4.355612]  bus_set_iommu+0xb6/0xf0
[    4.355612]  amd_iommu_init_api+0x112/0x132
[    4.355612]  state_next+0xfb1/0x1165
[    4.355612]  ? set_debug_rodata+0x11/0x11
[    4.355612]  amd_iommu_init+0x1f/0x67
[    4.355612]  ? e820__memblock_setup+0x60/0x60
[    4.355612]  pci_iommu_init+0x16/0x3f
[    4.355612]  do_one_initcall+0x4f/0x1d0
[    4.355612]  ? set_debug_rodata+0x11/0x11
[    4.355612]  kernel_init_freeable+0x1ee/0x27f
[    4.355612]  ? rest_init+0xb0/0xb0
[    4.355612]  kernel_init+0xa/0x110
[    4.355612]  ret_from_fork+0x22/0x40
[    4.489484] Mem-Info:
[    4.491778] active_anon:0 inactive_anon:0 isolated_anon:0
[    4.491778]  active_file:0 inactive_file:0 isolated_file:0
[    4.491778]  unevictable:3930 dirty:0 writeback:0 unstable:0
[    4.491778]  slab_reclaimable:2367 slab_unreclaimable:17814
[    4.491778]  mapped:0 shmem:0 pagetables:0 bounce:0
[    4.491778]  free:472 free_pcp:53 free_cma:0
[    4.522929] Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:15720kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[    4.548703] Node 0 DMA free:484kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:568kB managed:484kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[    4.573612] lowmem_reserve[]: 0 125 125 125
[    4.577799] Node 0 DMA32 free:1404kB min:1428kB low:1784kB high:2140kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:15720kB writepending:0kB present:261560kB managed:133752kB mlocked:0kB kernel_stack:2496kB pagetables:0kB bounce:0kB free_pcp:212kB local_pcp:212kB free_cma:0kB
[    4.605221] lowmem_reserve[]: 0 0 0 0
[    4.608887] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 484kB
[    4.621056] Node 0 DMA32: 9*4kB (UM) 1*8kB (U) 15*16kB (UM) 1*32kB (M) 17*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1404kB
[    4.633918] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[    4.642350] 3943 total pagecache pages
[    4.646106] 0 pages in swap cache
[    4.649424] Swap cache stats: add 0, delete 0, find 0/0
[    4.654651] Free swap  = 0kB
[    4.657536] Total swap = 0kB
[    4.660422] 65532 pages RAM
[    4.663219] 0 pages HighMem/MovableOnly
[    4.667061] 31973 pages reserved
[    4.670295] Unreclaimable slab info:
[    4.673874] Name                      Used          Total
[    4.679277] tcp_bind_bucket           29KB         32KB
[    4.684514] RAW                      240KB        240KB
[    4.689752] hugetlbfs_inode_cache          0KB          3KB
[    4.695333] biovec-max                32KB         32KB
[    4.700565] uid_cache                  0KB          3KB
[    4.705799] skbuff_head_cache          3KB          4KB
[    4.711033] shmem_inode_cache         56KB         59KB
[    4.716267] proc_dir_entry            40KB         43KB
[    4.721502] kernfs_node_cache       2420KB       2424KB
[    4.726737] mnt_cache                  4KB          7KB
[    4.731970] filp                       1KB          4KB
[    4.737197] names_cache              420KB        440KB
[    4.742425] fs_cache                   8KB         11KB
[    4.747656] files_cache               88KB         90KB
[    4.752887] signal_cache             166KB        171KB
[    4.758118] sighand_cache            321KB        321KB
[    4.763346] task_struct              516KB        516KB
[    4.768571] cred_jar                  29KB         31KB
[    4.773796] anon_vma_chain             9KB         12KB
[    4.779026] pid                        3KB          4KB
[    4.784261] Acpi-Operand             527KB        531KB
[    4.789494] Acpi-Parse                26KB         31KB
[    4.794721] Acpi-State                37KB         43KB
[    4.799946] Acpi-Namespace            98KB        100KB
[    4.805173] numa_policy                3KB          3KB
[    4.810399] trace_event_file         145KB        146KB
[    4.815626] ftrace_event_field        151KB        151KB
[    4.820948] pool_workqueue             4KB          4KB
[    4.826202] kmalloc-262144           256KB        256KB
[    4.831433] kmalloc-65536            128KB        128KB
[    4.836659] kmalloc-32768             64KB         64KB
[    4.841885] kmalloc-16384             48KB         48KB
[    4.847112] kmalloc-8192              80KB         80KB
[    4.852339] kmalloc-4096            2700KB       2700KB
[    4.857565] kmalloc-2048           59164KB      59164KB
[    4.862793] kmalloc-1024             705KB        708KB
[    4.868026] kmalloc-512              185KB        188KB
[    4.873251] kmalloc-256               84KB         88KB
[    4.878479] kmalloc-192              255KB        255KB
[    4.883706] kmalloc-96               177KB        180KB
[    4.888939] kmalloc-64               519KB        520KB
[    4.894165] kmalloc-32               230KB        232KB
[    4.899391] kmalloc-128              871KB        872KB
[    4.904617] kmem_cache                32KB         33KB
[    4.909842] Tasks state (memory values in pages):
[    4.914547] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[    4.923156] Out of memory and no killable processes...
[…]
```

Is on big server systems really more reserved memory needed, or is that
maybe something which can be fixed in the Linux kernel?


Kind regards,

Paul


PS: No idea, if the output below is helpful.

```
$ lspci -nn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
00:01.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric Device 18h Function 6 [1022:1466]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
00:19.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
00:19.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
00:19.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
00:19.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
00:19.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
00:19.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
00:19.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric Device 18h Function 6 [1022:1466]
00:19.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
00:1a.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
00:1a.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
00:1a.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
00:1a.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
00:1a.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
00:1a.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
00:1a.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric Device 18h Function 6 [1022:1466]
00:1a.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
00:1b.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
00:1b.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
00:1b.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
00:1b.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
00:1b.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
00:1b.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
00:1b.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric Device 18h Function 6 [1022:1466]
00:1b.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
00:1c.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
00:1c.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
00:1c.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
00:1c.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
00:1c.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
00:1c.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
00:1c.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric Device 18h Function 6 [1022:1466]
00:1c.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
00:1d.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
00:1d.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
00:1d.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
00:1d.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
00:1d.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
00:1d.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
00:1d.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric Device 18h Function 6 [1022:1466]
00:1d.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
00:1e.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
00:1e.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
00:1e.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
00:1e.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
00:1e.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
00:1e.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
00:1e.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric Device 18h Function 6 [1022:1466]
00:1e.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
00:1f.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
00:1f.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
00:1f.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
00:1f.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
00:1f.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
00:1f.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
00:1f.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric Device 18h Function 6 [1022:1466]
00:1f.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
01:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
01:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
02:00.0 PCI bridge [0604]: PLDA Device [1556:be00] (rev 02)
03:00.0 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller [102b:0536] (rev 04)
04:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)
04:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)
05:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
05:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
05:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] USB 3.0 Host controller [1022:145f]
06:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
06:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:1468]
06:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
20:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
20:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
20:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
20:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
20:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
20:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
20:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
20:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
20:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
20:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
21:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
21:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
21:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] USB 3.0 Host controller [1022:145f]
22:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
22:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:1468]
40:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
40:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
40:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
40:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
40:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
40:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
40:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
40:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
40:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
40:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
41:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
41:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
42:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
42:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:1468]
60:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
60:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
60:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
60:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
60:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
60:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
60:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
60:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
60:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
60:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
60:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
61:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic MegaRAID Tri-Mode SAS3508 [1000:0016] (rev 01)
62:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
62:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
63:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
63:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:1468]
80:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
80:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
80:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
80:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
80:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
80:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
80:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
80:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
80:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
80:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
81:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
81:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
82:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
82:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:1468]
a0:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
a0:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
a0:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
a0:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
a0:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
a0:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
a0:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
a0:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
a0:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
a0:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
a1:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
a1:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
a2:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
a2:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:1468]
c0:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
c0:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
c0:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
c0:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
c0:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
c0:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
c0:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
c0:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
c0:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
c0:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
c1:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
c1:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
c2:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
c2:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:1468]
e0:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
e0:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
e0:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
e0:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
e0:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
e0:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
e0:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
e0:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
e0:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
e0:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
e1:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
e1:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
e2:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
e2:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:1468]
```

[-- Attachment #1.1.2: ttyS0.log --]
[-- Type: text/x-log, Size: 155310 bytes --]

[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Crash kernel with 256 MB reserved memory runs into OOM condition
  2019-08-12  9:42 Crash kernel with 256 MB reserved memory runs into OOM condition Paul Menzel
@ 2019-08-12  9:50 ` Michal Hocko
  2019-08-12  9:59   ` Paul Menzel
  2019-08-13  2:43   ` Dave Young
  0 siblings, 2 replies; 8+ messages in thread
From: Michal Hocko @ 2019-08-12  9:50 UTC (permalink / raw)
  To: Paul Menzel
  Cc: linux-pci, x86, kexec, Linux Kernel Mailing List, iommu, Donald Buczek

On Mon 12-08-19 11:42:33, Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> On a Dell PowerEdge R7425 with two AMD EPYC 7601 (total 128 threads) and
> 1 TB RAM, the crash kernel with 256 MB of space reserved crashes.
> 
> Please find the messages of the normal and the crash kernel attached.

You will need more memory to reserve for the crash kernel because ...

> [    4.548703] Node 0 DMA free:484kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:568kB managed:484kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [    4.573612] lowmem_reserve[]: 0 125 125 125
> [    4.577799] Node 0 DMA32 free:1404kB min:1428kB low:1784kB high:2140kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:15720kB writepending:0kB present:261560kB managed:133752kB mlocked:0kB kernel_stack:2496kB pagetables:0kB bounce:0kB free_pcp:212kB local_pcp:212kB free_cma:0kB

... the memory is really depleted and nothing to be reclaimed (no anon.
file pages) Look how tht free memory is below min watermark (node zone DMA has
lowmem protection for GFP_KERNEL allocation).

[...]
> [    4.923156] Out of memory and no killable processes...

and there is no task existing to be killed so we go and panic.
-- 
Michal Hocko
SUSE Labs
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Crash kernel with 256 MB reserved memory runs into OOM condition
  2019-08-12  9:50 ` Michal Hocko
@ 2019-08-12  9:59   ` Paul Menzel
  2019-08-13  2:43   ` Dave Young
  1 sibling, 0 replies; 8+ messages in thread
From: Paul Menzel @ 2019-08-12  9:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-pci, x86, kexec, Linux Kernel Mailing List, iommu, Donald Buczek

[-- Attachment #1.1: Type: text/plain, Size: 1715 bytes --]

Dear Michal,


On 12.08.19 11:50, Michal Hocko wrote:
> On Mon 12-08-19 11:42:33, Paul Menzel wrote:

>> On a Dell PowerEdge R7425 with two AMD EPYC 7601 (total 128 threads) and
>> 1 TB RAM, the crash kernel with 256 MB of space reserved crashes.
>>
>> Please find the messages of the normal and the crash kernel attached.
> 
> You will need more memory to reserve for the crash kernel because ...
> 
>> [    4.548703] Node 0 DMA free:484kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:568kB managed:484kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>> [    4.573612] lowmem_reserve[]: 0 125 125 125
>> [    4.577799] Node 0 DMA32 free:1404kB min:1428kB low:1784kB high:2140kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:15720kB writepending:0kB present:261560kB managed:133752kB mlocked:0kB kernel_stack:2496kB pagetables:0kB bounce:0kB free_pcp:212kB local_pcp:212kB free_cma:0kB
> 
> ... the memory is really depleted and nothing to be reclaimed (no anon.
> file pages) Look how tht free memory is below min watermark (node zone DMA has
> lowmem protection for GFP_KERNEL allocation).
> 
> [...]
>> [    4.923156] Out of memory and no killable processes...
> 
> and there is no task existing to be killed so we go and panic.

Yeah, we figured that.

What we wonder is, how 256 MB are not enough for booting, and what
hardware properties cause it to be too small. In the overview I just
see a 60 MB allocation.

    [    4.857565] kmalloc-2048           59164KB      59164KB


Kind regards,

Paul


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Crash kernel with 256 MB reserved memory runs into OOM condition
  2019-08-12  9:50 ` Michal Hocko
  2019-08-12  9:59   ` Paul Menzel
@ 2019-08-13  2:43   ` Dave Young
  2019-08-13  2:46     ` Dave Young
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Young @ 2019-08-13  2:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Paul Menzel, kasong, linux-pci, x86, kexec,
	Linux Kernel Mailing List, iommu, Donald Buczek, lijiang

Hi,

On 08/12/19 at 11:50am, Michal Hocko wrote:
> On Mon 12-08-19 11:42:33, Paul Menzel wrote:
> > Dear Linux folks,
> > 
> > 
> > On a Dell PowerEdge R7425 with two AMD EPYC 7601 (total 128 threads) and
> > 1 TB RAM, the crash kernel with 256 MB of space reserved crashes.
> > 
> > Please find the messages of the normal and the crash kernel attached.
> 
> You will need more memory to reserve for the crash kernel because ...
> 
> > [    4.548703] Node 0 DMA free:484kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:568kB managed:484kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [    4.573612] lowmem_reserve[]: 0 125 125 125
> > [    4.577799] Node 0 DMA32 free:1404kB min:1428kB low:1784kB high:2140kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:15720kB writepending:0kB present:261560kB managed:133752kB mlocked:0kB kernel_stack:2496kB pagetables:0kB bounce:0kB free_pcp:212kB local_pcp:212kB free_cma:0kB
> 
> ... the memory is really depleted and nothing to be reclaimed (no anon.
> file pages) Look how tht free memory is below min watermark (node zone DMA has
> lowmem protection for GFP_KERNEL allocation).

We found similar issue on our side while working on kdump on SME enabled
systemd.  Kairui is working on some patches.

Actually on those SME/SEV enabled machines, swiotlb is enabled
automatically so at least we need extra 64M+ memory for kdump other
than the normal expectation.

Can you check if this is also your case?

Thanks
Dave
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Crash kernel with 256 MB reserved memory runs into OOM condition
  2019-08-13  2:43   ` Dave Young
@ 2019-08-13  2:46     ` Dave Young
  2019-08-13  2:54       ` Dave Young
  2019-08-15 17:00       ` Messages to kexec@ get moderated (was: Crash kernel with 256 MB reserved memory runs into OOM condition) Paul Menzel
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Young @ 2019-08-13  2:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Paul Menzel, kasong, linux-pci, x86, kexec,
	Linux Kernel Mailing List, iommu, Donald Buczek, lijiang

Add more cc.
On 08/13/19 at 10:43am, Dave Young wrote:
> Hi,
> 
> On 08/12/19 at 11:50am, Michal Hocko wrote:
> > On Mon 12-08-19 11:42:33, Paul Menzel wrote:
> > > Dear Linux folks,
> > > 
> > > 
> > > On a Dell PowerEdge R7425 with two AMD EPYC 7601 (total 128 threads) and
> > > 1 TB RAM, the crash kernel with 256 MB of space reserved crashes.
> > > 
> > > Please find the messages of the normal and the crash kernel attached.
> > 
> > You will need more memory to reserve for the crash kernel because ...
> > 
> > > [    4.548703] Node 0 DMA free:484kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:568kB managed:484kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > > [    4.573612] lowmem_reserve[]: 0 125 125 125
> > > [    4.577799] Node 0 DMA32 free:1404kB min:1428kB low:1784kB high:2140kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:15720kB writepending:0kB present:261560kB managed:133752kB mlocked:0kB kernel_stack:2496kB pagetables:0kB bounce:0kB free_pcp:212kB local_pcp:212kB free_cma:0kB
> > 
> > ... the memory is really depleted and nothing to be reclaimed (no anon.
> > file pages) Look how tht free memory is below min watermark (node zone DMA has
> > lowmem protection for GFP_KERNEL allocation).
> 
> We found similar issue on our side while working on kdump on SME enabled
> systemd.  Kairui is working on some patches.
> 
> Actually on those SME/SEV enabled machines, swiotlb is enabled
> automatically so at least we need extra 64M+ memory for kdump other
> than the normal expectation.
> 
> Can you check if this is also your case?

The question is to Paul,  also it would be always good to cc kexec mail
list for kexec and kdump issues.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Crash kernel with 256 MB reserved memory runs into OOM condition
  2019-08-13  2:46     ` Dave Young
@ 2019-08-13  2:54       ` Dave Young
  2019-09-04 10:10         ` Paul Menzel
  2019-08-15 17:00       ` Messages to kexec@ get moderated (was: Crash kernel with 256 MB reserved memory runs into OOM condition) Paul Menzel
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Young @ 2019-08-13  2:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Paul Menzel, kasong, linux-pci, x86, kexec,
	Linux Kernel Mailing List, iommu, Donald Buczek, lijiang

On 08/13/19 at 10:46am, Dave Young wrote:
> Add more cc.
> On 08/13/19 at 10:43am, Dave Young wrote:
> > Hi,
> > 
> > On 08/12/19 at 11:50am, Michal Hocko wrote:
> > > On Mon 12-08-19 11:42:33, Paul Menzel wrote:
> > > > Dear Linux folks,
> > > > 
> > > > 
> > > > On a Dell PowerEdge R7425 with two AMD EPYC 7601 (total 128 threads) and
> > > > 1 TB RAM, the crash kernel with 256 MB of space reserved crashes.
> > > > 
> > > > Please find the messages of the normal and the crash kernel attached.
> > > 
> > > You will need more memory to reserve for the crash kernel because ...
> > > 
> > > > [    4.548703] Node 0 DMA free:484kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:568kB managed:484kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > > > [    4.573612] lowmem_reserve[]: 0 125 125 125
> > > > [    4.577799] Node 0 DMA32 free:1404kB min:1428kB low:1784kB high:2140kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:15720kB writepending:0kB present:261560kB managed:133752kB mlocked:0kB kernel_stack:2496kB pagetables:0kB bounce:0kB free_pcp:212kB local_pcp:212kB free_cma:0kB
> > > 
> > > ... the memory is really depleted and nothing to be reclaimed (no anon.
> > > file pages) Look how tht free memory is below min watermark (node zone DMA has
> > > lowmem protection for GFP_KERNEL allocation).
> > 
> > We found similar issue on our side while working on kdump on SME enabled
> > systemd.  Kairui is working on some patches.
> > 
> > Actually on those SME/SEV enabled machines, swiotlb is enabled
> > automatically so at least we need extra 64M+ memory for kdump other
> > than the normal expectation.
> > 
> > Can you check if this is also your case?
> 
> The question is to Paul,  also it would be always good to cc kexec mail
> list for kexec and kdump issues.

Looks like hardware iommu is used, maybe you do not enable SME?

Also replace maxcpus=1 with nr_cpus=1 can save some memory, can have a
try.

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Messages to kexec@ get moderated (was: Crash kernel with 256 MB reserved memory runs into OOM condition)
  2019-08-13  2:46     ` Dave Young
  2019-08-13  2:54       ` Dave Young
@ 2019-08-15 17:00       ` Paul Menzel
  1 sibling, 0 replies; 8+ messages in thread
From: Paul Menzel @ 2019-08-15 17:00 UTC (permalink / raw)
  To: Dave Young, Michal Hocko
  Cc: kasong, linux-pci, x86, kexec, Linux Kernel Mailing List, iommu,
	Donald Buczek, lijiang

[-- Attachment #1.1: Type: text/plain, Size: 958 bytes --]

Dear Dave,


On 13.08.19 04:46, Dave Young wrote:

> On 08/13/19 at 10:43am, Dave Young wrote:

[…]

> The question is to Paul,  also it would be always good to cc kexec mail
> list for kexec and kdump issues.

kexec@ was CCed in my original mail, but my messages got moderated. It’d
great if you checked that with the list administrators.

> Your mail to 'kexec' with the subject
> 
>     Crash kernel with 256 MB reserved memory runs into OOM condition
> 
> Is being held until the list moderator can review it for approval.
> 
> The reason it is being held:
> 
>     Message has a suspicious header
> 
> Either the message will get posted to the list, or you will receive
> notification of the moderator's decision.  If you would like to cancel
> this posting, please visit the following URL:
> 
>     http://lists.infradead.org/mailman/confirm/kexec/a23ab6162ef34d099af5dd86c46113def5152bb1


Kind regards,

Paul


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Crash kernel with 256 MB reserved memory runs into OOM condition
  2019-08-13  2:54       ` Dave Young
@ 2019-09-04 10:10         ` Paul Menzel
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Menzel @ 2019-09-04 10:10 UTC (permalink / raw)
  To: Dave Young, Michal Hocko
  Cc: kasong, linux-pci, x86, kexec, Linux Kernel Mailing List, iommu,
	Donald Buczek, lijiang

[-- Attachment #1.1: Type: text/plain, Size: 2632 bytes --]

Dear Dave,


Thank you for your replies.


On 2019-08-13 04:54, Dave Young wrote:
> On 08/13/19 at 10:46am, Dave Young wrote:

>> On 08/13/19 at 10:43am, Dave Young wrote:

>>> On 08/12/19 at 11:50am, Michal Hocko wrote:
>>>> On Mon 12-08-19 11:42:33, Paul Menzel wrote:

>>>>> On a Dell PowerEdge R7425 with two AMD EPYC 7601 (total 128 threads) and
>>>>> 1 TB RAM, the crash kernel with 256 MB of space reserved crashes.
>>>>>
>>>>> Please find the messages of the normal and the crash kernel attached.
>>>>
>>>> You will need more memory to reserve for the crash kernel because ...
>>>>
>>>>> [    4.548703] Node 0 DMA free:484kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:568kB managed:484kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>> [    4.573612] lowmem_reserve[]: 0 125 125 125
>>>>> [    4.577799] Node 0 DMA32 free:1404kB min:1428kB low:1784kB high:2140kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:15720kB writepending:0kB present:261560kB managed:133752kB mlocked:0kB kernel_stack:2496kB pagetables:0kB bounce:0kB free_pcp:212kB local_pcp:212kB free_cma:0kB
>>>>
>>>> ... the memory is really depleted and nothing to be reclaimed (no anon.
>>>> file pages) Look how tht free memory is below min watermark (node zone DMA has
>>>> lowmem protection for GFP_KERNEL allocation).
>>>
>>> We found similar issue on our side while working on kdump on SME enabled
>>> systemd.  Kairui is working on some patches.
>>>
>>> Actually on those SME/SEV enabled machines, swiotlb is enabled
>>> automatically so at least we need extra 64M+ memory for kdump other
>>> than the normal expectation.
>>>
>>> Can you check if this is also your case?
>>
>> The question is to Paul,  also it would be always good to cc kexec mail
>> list for kexec and kdump issues.

As already replied <kexec@lists.infradead.org> was CCed in my original
message, but the list put it under moderation.

> Looks like hardware iommu is used, maybe you do not enable SME?

Do you mean AMD Secure Memory Encryption? I do not think, we use that.

> Also replace maxcpus=1 with nr_cpus=1 can save some memory, can have a
> try.

Thank you for this suggestion. That fixed it indeed, and the reserved
memory can stay at 256 MB. (The parameter names are a little unintuitive –
I guess due to historical reasons.


Kind regards,

Paul


[1]: https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-12  9:42 Crash kernel with 256 MB reserved memory runs into OOM condition Paul Menzel
2019-08-12  9:50 ` Michal Hocko
2019-08-12  9:59   ` Paul Menzel
2019-08-13  2:43   ` Dave Young
2019-08-13  2:46     ` Dave Young
2019-08-13  2:54       ` Dave Young
2019-09-04 10:10         ` Paul Menzel
2019-08-15 17:00       ` Messages to kexec@ get moderated (was: Crash kernel with 256 MB reserved memory runs into OOM condition) Paul Menzel

IOMMU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-iommu/0 linux-iommu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-iommu linux-iommu/ https://lore.kernel.org/linux-iommu \
		iommu@lists.linux-foundation.org iommu@archiver.kernel.org
	public-inbox-index linux-iommu


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.linux-foundation.lists.iommu


AGPL code for this site: git clone https://public-inbox.org/ public-inbox