* iommu_iova slab eats too much memory
@ 2020-04-23 9:11 Bin
2020-04-23 9:14 ` Bin
0 siblings, 1 reply; 16+ messages in thread
From: Bin @ 2020-04-23 9:11 UTC (permalink / raw)
To: iommu
[-- Attachment #1.1: Type: text/plain, Size: 4394 bytes --]
Hey, guys:
I'm running a batch of CoreOS boxes, the lsb_release is:
```
# cat /etc/lsb-release
DISTRIB_ID="Container Linux by CoreOS"
DISTRIB_RELEASE=2303.3.0
DISTRIB_CODENAME="Rhyolite"
DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
```
```
# uname -a
Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019
x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel GNU/Linux
```
Recently, I found my vms constently being killed due to OOM, and after
digging into the problem, I finally realized that the kernel is leaking
memory.
Here's my slabinfo:
Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
Active / Total Caches (% used) : 152 / 217 (70.0%)
Active / Total Size (% used) : 5828768.08K / 5996848.72K (97.2%)
Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
80253888 80253888 100% 0.06K 1253967 64 5015868K iommu_iova
489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
297444 271112 91% 0.19K 7082 42 56656K dentry
254400 252784 99% 0.06K 3975 64 15900K anon_vma_chain
222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
202482 201814 99% 0.19K 4821 42 38568K vm_area_struct
200192 200192 100% 0.01K 391 512 1564K kmalloc-8
170528 169359 99% 0.25K 5329 32 42632K filp
158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
149914 149365 99% 0.09K 3259 46 13036K anon_vma
146640 143123 97% 0.10K 3760 39 15040K buffer_head
130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
105468 105106 99% 0.04K 1034 102 4136K
selinux_inode_security
73080 73080 100% 0.13K 2436 30 9744K kernfs_node_cache
72360 70261 97% 0.59K 1340 54 42880K inode_cache
71040 71040 100% 0.12K 2220 32 8880K eventpoll_epi
68096 59262 87% 0.02K 266 256 1064K kmalloc-16
53652 53652 100% 0.04K 526 102 2104K pde_opener
50496 31654 62% 2.00K 3156 16 100992K kmalloc-2048
46242 46242 100% 0.19K 1101 42 8808K cred_jar
44496 43013 96% 0.66K 927 48 29664K proc_inode_cache
44352 44352 100% 0.06K 693 64 2772K task_delay_info
43516 43471 99% 0.69K 946 46 30272K sock_inode_cache
37856 27626 72% 1.00K 1183 32 37856K kmalloc-1024
36736 36736 100% 0.07K 656 56 2624K eventpoll_pwq
34076 31282 91% 0.57K 1217 28 19472K radix_tree_node
33660 30528 90% 1.05K 1122 30 35904K ext4_inode_cache
32760 30959 94% 0.19K 780 42 6240K kmalloc-192
32028 32028 100% 0.04K 314 102 1256K ext4_extent_status
30048 30048 100% 0.25K 939 32 7512K skbuff_head_cache
28736 28736 100% 0.06K 449 64 1796K fs_cache
24702 24702 100% 0.69K 537 46 17184K files_cache
23808 23808 100% 0.66K 496 48 15872K ovl_inode
23104 22945 99% 0.12K 722 32 2888K kmalloc-128
22724 21307 93% 0.69K 494 46 15808K shmem_inode_cache
21472 21472 100% 0.12K 671 32 2684K seq_file
19904 19904 100% 1.00K 622 32 19904K UNIX
17340 17340 100% 1.06K 578 30 18496K mm_struct
15980 15980 100% 0.02K 94 170 376K avtab_node
14070 14070 100% 1.06K 469 30 15008K signal_cache
13248 13248 100% 0.12K 414 32 1656K pid
12128 11777 97% 0.25K 379 32 3032K kmalloc-256
11008 11008 100% 0.02K 43 256 172K
selinux_file_security
10812 10812 100% 0.04K 106 102 424K Acpi-Namespace
These information shows that the 'iommu_iova' is the top memory consumer.
In order to optimize the network performence of Openstack virtual machines,
I enabled the vt-d feature in bios and sriov feature of Intel 82599 10G
NIC. I'm assuming this is the root cause of this issue.
Is there anything I can do to fix it?
[-- Attachment #1.2: Type: text/html, Size: 6015 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iommu_iova slab eats too much memory
2020-04-23 9:11 iommu_iova slab eats too much memory Bin
@ 2020-04-23 9:14 ` Bin
2020-04-24 0:40 ` Bin
0 siblings, 1 reply; 16+ messages in thread
From: Bin @ 2020-04-23 9:14 UTC (permalink / raw)
To: iommu
[-- Attachment #1.1: Type: text/plain, Size: 4886 bytes --]
Forget to mention, I've already disabled the slab merge, so this is what it
is.
Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
> Hey, guys:
>
> I'm running a batch of CoreOS boxes, the lsb_release is:
>
> ```
> # cat /etc/lsb-release
> DISTRIB_ID="Container Linux by CoreOS"
> DISTRIB_RELEASE=2303.3.0
> DISTRIB_CODENAME="Rhyolite"
> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
> ```
>
> ```
> # uname -a
> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019
> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel GNU/Linux
> ```
> Recently, I found my vms constently being killed due to OOM, and after
> digging into the problem, I finally realized that the kernel is leaking
> memory.
>
> Here's my slabinfo:
>
> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
> Active / Total Caches (% used) : 152 / 217 (70.0%)
> Active / Total Size (% used) : 5828768.08K / 5996848.72K (97.2%)
> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
>
> 80253888 80253888 100% 0.06K 1253967 64 5015868K iommu_iova
>
> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
>
> 297444 271112 91% 0.19K 7082 42 56656K dentry
>
> 254400 252784 99% 0.06K 3975 64 15900K anon_vma_chain
>
> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
>
> 202482 201814 99% 0.19K 4821 42 38568K vm_area_struct
>
> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
>
> 170528 169359 99% 0.25K 5329 32 42632K filp
>
> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
>
> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
>
> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
>
> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
>
> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
>
> 105468 105106 99% 0.04K 1034 102 4136K
> selinux_inode_security
> 73080 73080 100% 0.13K 2436 30 9744K kernfs_node_cache
>
> 72360 70261 97% 0.59K 1340 54 42880K inode_cache
>
> 71040 71040 100% 0.12K 2220 32 8880K eventpoll_epi
>
> 68096 59262 87% 0.02K 266 256 1064K kmalloc-16
>
> 53652 53652 100% 0.04K 526 102 2104K pde_opener
>
> 50496 31654 62% 2.00K 3156 16 100992K kmalloc-2048
>
> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
>
> 44496 43013 96% 0.66K 927 48 29664K proc_inode_cache
>
> 44352 44352 100% 0.06K 693 64 2772K task_delay_info
>
> 43516 43471 99% 0.69K 946 46 30272K sock_inode_cache
>
> 37856 27626 72% 1.00K 1183 32 37856K kmalloc-1024
>
> 36736 36736 100% 0.07K 656 56 2624K eventpoll_pwq
>
> 34076 31282 91% 0.57K 1217 28 19472K radix_tree_node
>
> 33660 30528 90% 1.05K 1122 30 35904K ext4_inode_cache
>
> 32760 30959 94% 0.19K 780 42 6240K kmalloc-192
>
> 32028 32028 100% 0.04K 314 102 1256K ext4_extent_status
>
> 30048 30048 100% 0.25K 939 32 7512K skbuff_head_cache
>
> 28736 28736 100% 0.06K 449 64 1796K fs_cache
>
> 24702 24702 100% 0.69K 537 46 17184K files_cache
>
> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
>
> 23104 22945 99% 0.12K 722 32 2888K kmalloc-128
>
> 22724 21307 93% 0.69K 494 46 15808K shmem_inode_cache
>
> 21472 21472 100% 0.12K 671 32 2684K seq_file
>
> 19904 19904 100% 1.00K 622 32 19904K UNIX
>
> 17340 17340 100% 1.06K 578 30 18496K mm_struct
>
> 15980 15980 100% 0.02K 94 170 376K avtab_node
>
> 14070 14070 100% 1.06K 469 30 15008K signal_cache
>
> 13248 13248 100% 0.12K 414 32 1656K pid
>
> 12128 11777 97% 0.25K 379 32 3032K kmalloc-256
>
> 11008 11008 100% 0.02K 43 256 172K
> selinux_file_security
> 10812 10812 100% 0.04K 106 102 424K Acpi-Namespace
>
> These information shows that the 'iommu_iova' is the top memory consumer.
> In order to optimize the network performence of Openstack virtual machines,
> I enabled the vt-d feature in bios and sriov feature of Intel 82599 10G
> NIC. I'm assuming this is the root cause of this issue.
>
> Is there anything I can do to fix it?
>
[-- Attachment #1.2: Type: text/html, Size: 6452 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iommu_iova slab eats too much memory
2020-04-23 9:14 ` Bin
@ 2020-04-24 0:40 ` Bin
2020-04-24 11:20 ` Robin Murphy
0 siblings, 1 reply; 16+ messages in thread
From: Bin @ 2020-04-24 0:40 UTC (permalink / raw)
To: iommu
[-- Attachment #1.1: Type: text/plain, Size: 5124 bytes --]
Hello? anyone there?
Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
> Forget to mention, I've already disabled the slab merge, so this is what
> it is.
>
> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
>
>> Hey, guys:
>>
>> I'm running a batch of CoreOS boxes, the lsb_release is:
>>
>> ```
>> # cat /etc/lsb-release
>> DISTRIB_ID="Container Linux by CoreOS"
>> DISTRIB_RELEASE=2303.3.0
>> DISTRIB_CODENAME="Rhyolite"
>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
>> ```
>>
>> ```
>> # uname -a
>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019
>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel GNU/Linux
>> ```
>> Recently, I found my vms constently being killed due to OOM, and after
>> digging into the problem, I finally realized that the kernel is leaking
>> memory.
>>
>> Here's my slabinfo:
>>
>> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
>> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
>> Active / Total Caches (% used) : 152 / 217 (70.0%)
>> Active / Total Size (% used) : 5828768.08K / 5996848.72K (97.2%)
>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
>>
>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
>>
>> 80253888 80253888 100% 0.06K 1253967 64 5015868K iommu_iova
>>
>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
>>
>> 297444 271112 91% 0.19K 7082 42 56656K dentry
>>
>> 254400 252784 99% 0.06K 3975 64 15900K anon_vma_chain
>>
>> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
>>
>> 202482 201814 99% 0.19K 4821 42 38568K vm_area_struct
>>
>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
>>
>> 170528 169359 99% 0.25K 5329 32 42632K filp
>>
>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
>>
>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
>>
>> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
>>
>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
>>
>> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
>>
>> 105468 105106 99% 0.04K 1034 102 4136K
>> selinux_inode_security
>> 73080 73080 100% 0.13K 2436 30 9744K kernfs_node_cache
>>
>> 72360 70261 97% 0.59K 1340 54 42880K inode_cache
>>
>> 71040 71040 100% 0.12K 2220 32 8880K eventpoll_epi
>>
>> 68096 59262 87% 0.02K 266 256 1064K kmalloc-16
>>
>> 53652 53652 100% 0.04K 526 102 2104K pde_opener
>>
>> 50496 31654 62% 2.00K 3156 16 100992K kmalloc-2048
>>
>> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
>>
>> 44496 43013 96% 0.66K 927 48 29664K proc_inode_cache
>>
>> 44352 44352 100% 0.06K 693 64 2772K task_delay_info
>>
>> 43516 43471 99% 0.69K 946 46 30272K sock_inode_cache
>>
>> 37856 27626 72% 1.00K 1183 32 37856K kmalloc-1024
>>
>> 36736 36736 100% 0.07K 656 56 2624K eventpoll_pwq
>>
>> 34076 31282 91% 0.57K 1217 28 19472K radix_tree_node
>>
>> 33660 30528 90% 1.05K 1122 30 35904K ext4_inode_cache
>>
>> 32760 30959 94% 0.19K 780 42 6240K kmalloc-192
>>
>> 32028 32028 100% 0.04K 314 102 1256K ext4_extent_status
>>
>> 30048 30048 100% 0.25K 939 32 7512K skbuff_head_cache
>>
>> 28736 28736 100% 0.06K 449 64 1796K fs_cache
>>
>> 24702 24702 100% 0.69K 537 46 17184K files_cache
>>
>> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
>>
>> 23104 22945 99% 0.12K 722 32 2888K kmalloc-128
>>
>> 22724 21307 93% 0.69K 494 46 15808K shmem_inode_cache
>>
>> 21472 21472 100% 0.12K 671 32 2684K seq_file
>>
>> 19904 19904 100% 1.00K 622 32 19904K UNIX
>>
>> 17340 17340 100% 1.06K 578 30 18496K mm_struct
>>
>> 15980 15980 100% 0.02K 94 170 376K avtab_node
>>
>> 14070 14070 100% 1.06K 469 30 15008K signal_cache
>>
>> 13248 13248 100% 0.12K 414 32 1656K pid
>>
>> 12128 11777 97% 0.25K 379 32 3032K kmalloc-256
>>
>> 11008 11008 100% 0.02K 43 256 172K
>> selinux_file_security
>> 10812 10812 100% 0.04K 106 102 424K Acpi-Namespace
>>
>> These information shows that the 'iommu_iova' is the top memory consumer.
>> In order to optimize the network performence of Openstack virtual machines,
>> I enabled the vt-d feature in bios and sriov feature of Intel 82599 10G
>> NIC. I'm assuming this is the root cause of this issue.
>>
>> Is there anything I can do to fix it?
>>
>
[-- Attachment #1.2: Type: text/html, Size: 6842 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iommu_iova slab eats too much memory
2020-04-24 0:40 ` Bin
@ 2020-04-24 11:20 ` Robin Murphy
2020-04-24 12:00 ` Bin
0 siblings, 1 reply; 16+ messages in thread
From: Robin Murphy @ 2020-04-24 11:20 UTC (permalink / raw)
To: Bin, iommu
On 2020-04-24 1:40 am, Bin wrote:
> Hello? anyone there?
>
> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
>
>> Forget to mention, I've already disabled the slab merge, so this is what
>> it is.
>>
>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
>>
>>> Hey, guys:
>>>
>>> I'm running a batch of CoreOS boxes, the lsb_release is:
>>>
>>> ```
>>> # cat /etc/lsb-release
>>> DISTRIB_ID="Container Linux by CoreOS"
>>> DISTRIB_RELEASE=2303.3.0
>>> DISTRIB_CODENAME="Rhyolite"
>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
>>> ```
>>>
>>> ```
>>> # uname -a
>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019
>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel GNU/Linux
>>> ```
>>> Recently, I found my vms constently being killed due to OOM, and after
>>> digging into the problem, I finally realized that the kernel is leaking
>>> memory.
>>>
>>> Here's my slabinfo:
>>>
>>> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
>>> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
>>> Active / Total Caches (% used) : 152 / 217 (70.0%)
>>> Active / Total Size (% used) : 5828768.08K / 5996848.72K (97.2%)
>>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
>>>
>>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
>>>
>>> 80253888 80253888 100% 0.06K 1253967 64 5015868K iommu_iova
Do you really have a peak demand of ~80 million simultaneous DMA
buffers, or is some driver leaking DMA mappings?
Robin.
>>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
>>>
>>> 297444 271112 91% 0.19K 7082 42 56656K dentry
>>>
>>> 254400 252784 99% 0.06K 3975 64 15900K anon_vma_chain
>>>
>>> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
>>>
>>> 202482 201814 99% 0.19K 4821 42 38568K vm_area_struct
>>>
>>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
>>>
>>> 170528 169359 99% 0.25K 5329 32 42632K filp
>>>
>>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
>>>
>>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
>>>
>>> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
>>>
>>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
>>>
>>> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
>>>
>>> 105468 105106 99% 0.04K 1034 102 4136K
>>> selinux_inode_security
>>> 73080 73080 100% 0.13K 2436 30 9744K kernfs_node_cache
>>>
>>> 72360 70261 97% 0.59K 1340 54 42880K inode_cache
>>>
>>> 71040 71040 100% 0.12K 2220 32 8880K eventpoll_epi
>>>
>>> 68096 59262 87% 0.02K 266 256 1064K kmalloc-16
>>>
>>> 53652 53652 100% 0.04K 526 102 2104K pde_opener
>>>
>>> 50496 31654 62% 2.00K 3156 16 100992K kmalloc-2048
>>>
>>> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
>>>
>>> 44496 43013 96% 0.66K 927 48 29664K proc_inode_cache
>>>
>>> 44352 44352 100% 0.06K 693 64 2772K task_delay_info
>>>
>>> 43516 43471 99% 0.69K 946 46 30272K sock_inode_cache
>>>
>>> 37856 27626 72% 1.00K 1183 32 37856K kmalloc-1024
>>>
>>> 36736 36736 100% 0.07K 656 56 2624K eventpoll_pwq
>>>
>>> 34076 31282 91% 0.57K 1217 28 19472K radix_tree_node
>>>
>>> 33660 30528 90% 1.05K 1122 30 35904K ext4_inode_cache
>>>
>>> 32760 30959 94% 0.19K 780 42 6240K kmalloc-192
>>>
>>> 32028 32028 100% 0.04K 314 102 1256K ext4_extent_status
>>>
>>> 30048 30048 100% 0.25K 939 32 7512K skbuff_head_cache
>>>
>>> 28736 28736 100% 0.06K 449 64 1796K fs_cache
>>>
>>> 24702 24702 100% 0.69K 537 46 17184K files_cache
>>>
>>> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
>>>
>>> 23104 22945 99% 0.12K 722 32 2888K kmalloc-128
>>>
>>> 22724 21307 93% 0.69K 494 46 15808K shmem_inode_cache
>>>
>>> 21472 21472 100% 0.12K 671 32 2684K seq_file
>>>
>>> 19904 19904 100% 1.00K 622 32 19904K UNIX
>>>
>>> 17340 17340 100% 1.06K 578 30 18496K mm_struct
>>>
>>> 15980 15980 100% 0.02K 94 170 376K avtab_node
>>>
>>> 14070 14070 100% 1.06K 469 30 15008K signal_cache
>>>
>>> 13248 13248 100% 0.12K 414 32 1656K pid
>>>
>>> 12128 11777 97% 0.25K 379 32 3032K kmalloc-256
>>>
>>> 11008 11008 100% 0.02K 43 256 172K
>>> selinux_file_security
>>> 10812 10812 100% 0.04K 106 102 424K Acpi-Namespace
>>>
>>> These information shows that the 'iommu_iova' is the top memory consumer.
>>> In order to optimize the network performence of Openstack virtual machines,
>>> I enabled the vt-d feature in bios and sriov feature of Intel 82599 10G
>>> NIC. I'm assuming this is the root cause of this issue.
>>>
>>> Is there anything I can do to fix it?
>>>
>>
>
>
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iommu_iova slab eats too much memory
2020-04-24 11:20 ` Robin Murphy
@ 2020-04-24 12:00 ` Bin
2020-04-24 12:06 ` Bin
0 siblings, 1 reply; 16+ messages in thread
From: Bin @ 2020-04-24 12:00 UTC (permalink / raw)
To: Robin Murphy; +Cc: iommu
[-- Attachment #1.1: Type: text/plain, Size: 6240 bytes --]
Well, that's the problem! I'm assuming the iommu kernel module is leaking
memory. But I don't know why and how.
Do you have any idea about it? Or any further information is needed?
Robin Murphy <robin.murphy@arm.com> 于 2020年4月24日周五 19:20写道:
> On 2020-04-24 1:40 am, Bin wrote:
> > Hello? anyone there?
> >
> > Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
> >
> >> Forget to mention, I've already disabled the slab merge, so this is what
> >> it is.
> >>
> >> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
> >>
> >>> Hey, guys:
> >>>
> >>> I'm running a batch of CoreOS boxes, the lsb_release is:
> >>>
> >>> ```
> >>> # cat /etc/lsb-release
> >>> DISTRIB_ID="Container Linux by CoreOS"
> >>> DISTRIB_RELEASE=2303.3.0
> >>> DISTRIB_CODENAME="Rhyolite"
> >>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
> >>> ```
> >>>
> >>> ```
> >>> # uname -a
> >>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019
> >>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel GNU/Linux
> >>> ```
> >>> Recently, I found my vms constently being killed due to OOM, and after
> >>> digging into the problem, I finally realized that the kernel is leaking
> >>> memory.
> >>>
> >>> Here's my slabinfo:
> >>>
> >>> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
> >>> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
> >>> Active / Total Caches (% used) : 152 / 217 (70.0%)
> >>> Active / Total Size (% used) : 5828768.08K / 5996848.72K
> (97.2%)
> >>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
> >>>
> >>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> >>>
> >>> 80253888 80253888 100% 0.06K 1253967 64 5015868K iommu_iova
>
> Do you really have a peak demand of ~80 million simultaneous DMA
> buffers, or is some driver leaking DMA mappings?
>
> Robin.
>
> >>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
> >>>
> >>> 297444 271112 91% 0.19K 7082 42 56656K dentry
> >>>
> >>> 254400 252784 99% 0.06K 3975 64 15900K anon_vma_chain
> >>>
> >>> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
> >>>
> >>> 202482 201814 99% 0.19K 4821 42 38568K vm_area_struct
> >>>
> >>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
> >>>
> >>> 170528 169359 99% 0.25K 5329 32 42632K filp
> >>>
> >>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
> >>>
> >>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
> >>>
> >>> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
> >>>
> >>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
> >>>
> >>> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
> >>>
> >>> 105468 105106 99% 0.04K 1034 102 4136K
> >>> selinux_inode_security
> >>> 73080 73080 100% 0.13K 2436 30 9744K
> kernfs_node_cache
> >>>
> >>> 72360 70261 97% 0.59K 1340 54 42880K inode_cache
> >>>
> >>> 71040 71040 100% 0.12K 2220 32 8880K eventpoll_epi
> >>>
> >>> 68096 59262 87% 0.02K 266 256 1064K kmalloc-16
> >>>
> >>> 53652 53652 100% 0.04K 526 102 2104K pde_opener
> >>>
> >>> 50496 31654 62% 2.00K 3156 16 100992K kmalloc-2048
> >>>
> >>> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
> >>>
> >>> 44496 43013 96% 0.66K 927 48 29664K
> proc_inode_cache
> >>>
> >>> 44352 44352 100% 0.06K 693 64 2772K task_delay_info
> >>>
> >>> 43516 43471 99% 0.69K 946 46 30272K
> sock_inode_cache
> >>>
> >>> 37856 27626 72% 1.00K 1183 32 37856K kmalloc-1024
> >>>
> >>> 36736 36736 100% 0.07K 656 56 2624K eventpoll_pwq
> >>>
> >>> 34076 31282 91% 0.57K 1217 28 19472K radix_tree_node
> >>>
> >>> 33660 30528 90% 1.05K 1122 30 35904K
> ext4_inode_cache
> >>>
> >>> 32760 30959 94% 0.19K 780 42 6240K kmalloc-192
> >>>
> >>> 32028 32028 100% 0.04K 314 102 1256K
> ext4_extent_status
> >>>
> >>> 30048 30048 100% 0.25K 939 32 7512K
> skbuff_head_cache
> >>>
> >>> 28736 28736 100% 0.06K 449 64 1796K fs_cache
> >>>
> >>> 24702 24702 100% 0.69K 537 46 17184K files_cache
> >>>
> >>> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
> >>>
> >>> 23104 22945 99% 0.12K 722 32 2888K kmalloc-128
> >>>
> >>> 22724 21307 93% 0.69K 494 46 15808K
> shmem_inode_cache
> >>>
> >>> 21472 21472 100% 0.12K 671 32 2684K seq_file
> >>>
> >>> 19904 19904 100% 1.00K 622 32 19904K UNIX
> >>>
> >>> 17340 17340 100% 1.06K 578 30 18496K mm_struct
> >>>
> >>> 15980 15980 100% 0.02K 94 170 376K avtab_node
> >>>
> >>> 14070 14070 100% 1.06K 469 30 15008K signal_cache
> >>>
> >>> 13248 13248 100% 0.12K 414 32 1656K pid
> >>>
> >>> 12128 11777 97% 0.25K 379 32 3032K kmalloc-256
> >>>
> >>> 11008 11008 100% 0.02K 43 256 172K
> >>> selinux_file_security
> >>> 10812 10812 100% 0.04K 106 102 424K Acpi-Namespace
> >>>
> >>> These information shows that the 'iommu_iova' is the top memory
> consumer.
> >>> In order to optimize the network performence of Openstack virtual
> machines,
> >>> I enabled the vt-d feature in bios and sriov feature of Intel 82599 10G
> >>> NIC. I'm assuming this is the root cause of this issue.
> >>>
> >>> Is there anything I can do to fix it?
> >>>
> >>
> >
> >
> > _______________________________________________
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >
>
[-- Attachment #1.2: Type: text/html, Size: 9133 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iommu_iova slab eats too much memory
2020-04-24 12:00 ` Bin
@ 2020-04-24 12:06 ` Bin
2020-04-24 12:15 ` Robin Murphy
0 siblings, 1 reply; 16+ messages in thread
From: Bin @ 2020-04-24 12:06 UTC (permalink / raw)
To: Robin Murphy; +Cc: iommu
[-- Attachment #1.1: Type: text/plain, Size: 6747 bytes --]
I'm not familiar with the mmu stuff, so what you mean by "some driver
leaking DMA mappings", is it possible that some other kernel module like
KVM or NIC driver leads to the leaking problem instead of the iommu module
itself?
Bin <anole1949@gmail.com> 于 2020年4月24日周五 20:00写道:
> Well, that's the problem! I'm assuming the iommu kernel module is leaking
> memory. But I don't know why and how.
>
> Do you have any idea about it? Or any further information is needed?
>
> Robin Murphy <robin.murphy@arm.com> 于 2020年4月24日周五 19:20写道:
>
>> On 2020-04-24 1:40 am, Bin wrote:
>> > Hello? anyone there?
>> >
>> > Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
>> >
>> >> Forget to mention, I've already disabled the slab merge, so this is
>> what
>> >> it is.
>> >>
>> >> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
>> >>
>> >>> Hey, guys:
>> >>>
>> >>> I'm running a batch of CoreOS boxes, the lsb_release is:
>> >>>
>> >>> ```
>> >>> # cat /etc/lsb-release
>> >>> DISTRIB_ID="Container Linux by CoreOS"
>> >>> DISTRIB_RELEASE=2303.3.0
>> >>> DISTRIB_CODENAME="Rhyolite"
>> >>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
>> >>> ```
>> >>>
>> >>> ```
>> >>> # uname -a
>> >>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00
>> 2019
>> >>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
>> GNU/Linux
>> >>> ```
>> >>> Recently, I found my vms constently being killed due to OOM, and after
>> >>> digging into the problem, I finally realized that the kernel is
>> leaking
>> >>> memory.
>> >>>
>> >>> Here's my slabinfo:
>> >>>
>> >>> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
>> >>> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
>> >>> Active / Total Caches (% used) : 152 / 217 (70.0%)
>> >>> Active / Total Size (% used) : 5828768.08K / 5996848.72K
>> (97.2%)
>> >>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
>> >>>
>> >>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
>> >>>
>> >>> 80253888 80253888 100% 0.06K 1253967 64 5015868K iommu_iova
>>
>> Do you really have a peak demand of ~80 million simultaneous DMA
>> buffers, or is some driver leaking DMA mappings?
>>
>> Robin.
>>
>> >>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
>> >>>
>> >>> 297444 271112 91% 0.19K 7082 42 56656K dentry
>> >>>
>> >>> 254400 252784 99% 0.06K 3975 64 15900K anon_vma_chain
>> >>>
>> >>> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
>> >>>
>> >>> 202482 201814 99% 0.19K 4821 42 38568K vm_area_struct
>> >>>
>> >>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
>> >>>
>> >>> 170528 169359 99% 0.25K 5329 32 42632K filp
>> >>>
>> >>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
>> >>>
>> >>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
>> >>>
>> >>> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
>> >>>
>> >>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
>> >>>
>> >>> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
>> >>>
>> >>> 105468 105106 99% 0.04K 1034 102 4136K
>> >>> selinux_inode_security
>> >>> 73080 73080 100% 0.13K 2436 30 9744K
>> kernfs_node_cache
>> >>>
>> >>> 72360 70261 97% 0.59K 1340 54 42880K inode_cache
>> >>>
>> >>> 71040 71040 100% 0.12K 2220 32 8880K eventpoll_epi
>> >>>
>> >>> 68096 59262 87% 0.02K 266 256 1064K kmalloc-16
>> >>>
>> >>> 53652 53652 100% 0.04K 526 102 2104K pde_opener
>> >>>
>> >>> 50496 31654 62% 2.00K 3156 16 100992K kmalloc-2048
>> >>>
>> >>> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
>> >>>
>> >>> 44496 43013 96% 0.66K 927 48 29664K
>> proc_inode_cache
>> >>>
>> >>> 44352 44352 100% 0.06K 693 64 2772K
>> task_delay_info
>> >>>
>> >>> 43516 43471 99% 0.69K 946 46 30272K
>> sock_inode_cache
>> >>>
>> >>> 37856 27626 72% 1.00K 1183 32 37856K kmalloc-1024
>> >>>
>> >>> 36736 36736 100% 0.07K 656 56 2624K eventpoll_pwq
>> >>>
>> >>> 34076 31282 91% 0.57K 1217 28 19472K
>> radix_tree_node
>> >>>
>> >>> 33660 30528 90% 1.05K 1122 30 35904K
>> ext4_inode_cache
>> >>>
>> >>> 32760 30959 94% 0.19K 780 42 6240K kmalloc-192
>> >>>
>> >>> 32028 32028 100% 0.04K 314 102 1256K
>> ext4_extent_status
>> >>>
>> >>> 30048 30048 100% 0.25K 939 32 7512K
>> skbuff_head_cache
>> >>>
>> >>> 28736 28736 100% 0.06K 449 64 1796K fs_cache
>> >>>
>> >>> 24702 24702 100% 0.69K 537 46 17184K files_cache
>> >>>
>> >>> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
>> >>>
>> >>> 23104 22945 99% 0.12K 722 32 2888K kmalloc-128
>> >>>
>> >>> 22724 21307 93% 0.69K 494 46 15808K
>> shmem_inode_cache
>> >>>
>> >>> 21472 21472 100% 0.12K 671 32 2684K seq_file
>> >>>
>> >>> 19904 19904 100% 1.00K 622 32 19904K UNIX
>> >>>
>> >>> 17340 17340 100% 1.06K 578 30 18496K mm_struct
>> >>>
>> >>> 15980 15980 100% 0.02K 94 170 376K avtab_node
>> >>>
>> >>> 14070 14070 100% 1.06K 469 30 15008K signal_cache
>> >>>
>> >>> 13248 13248 100% 0.12K 414 32 1656K pid
>> >>>
>> >>> 12128 11777 97% 0.25K 379 32 3032K kmalloc-256
>> >>>
>> >>> 11008 11008 100% 0.02K 43 256 172K
>> >>> selinux_file_security
>> >>> 10812 10812 100% 0.04K 106 102 424K Acpi-Namespace
>> >>>
>> >>> These information shows that the 'iommu_iova' is the top memory
>> consumer.
>> >>> In order to optimize the network performence of Openstack virtual
>> machines,
>> >>> I enabled the vt-d feature in bios and sriov feature of Intel 82599
>> 10G
>> >>> NIC. I'm assuming this is the root cause of this issue.
>> >>>
>> >>> Is there anything I can do to fix it?
>> >>>
>> >>
>> >
>> >
>> > _______________________________________________
>> > iommu mailing list
>> > iommu@lists.linux-foundation.org
>> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
>> >
>>
>
[-- Attachment #1.2: Type: text/html, Size: 9781 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iommu_iova slab eats too much memory
2020-04-24 12:06 ` Bin
@ 2020-04-24 12:15 ` Robin Murphy
2020-04-24 13:20 ` Bin
0 siblings, 1 reply; 16+ messages in thread
From: Robin Murphy @ 2020-04-24 12:15 UTC (permalink / raw)
To: Bin; +Cc: iommu
On 2020-04-24 1:06 pm, Bin wrote:
> I'm not familiar with the mmu stuff, so what you mean by "some driver
> leaking DMA mappings", is it possible that some other kernel module like
> KVM or NIC driver leads to the leaking problem instead of the iommu module
> itself?
Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
should, since I'd expect a lot of people to have noticed that. It's far
more likely that some driver is failing to call dma_unmap_* when it's
finished with a buffer - with the IOMMU disabled that would be a no-op
on x86 with a modern 64-bit-capable device, so such a latent bug could
have been easily overlooked.
Robin.
> Bin <anole1949@gmail.com> 于 2020年4月24日周五 20:00写道:
>
>> Well, that's the problem! I'm assuming the iommu kernel module is leaking
>> memory. But I don't know why and how.
>>
>> Do you have any idea about it? Or any further information is needed?
>>
>> Robin Murphy <robin.murphy@arm.com> 于 2020年4月24日周五 19:20写道:
>>
>>> On 2020-04-24 1:40 am, Bin wrote:
>>>> Hello? anyone there?
>>>>
>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
>>>>
>>>>> Forget to mention, I've already disabled the slab merge, so this is
>>> what
>>>>> it is.
>>>>>
>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
>>>>>
>>>>>> Hey, guys:
>>>>>>
>>>>>> I'm running a batch of CoreOS boxes, the lsb_release is:
>>>>>>
>>>>>> ```
>>>>>> # cat /etc/lsb-release
>>>>>> DISTRIB_ID="Container Linux by CoreOS"
>>>>>> DISTRIB_RELEASE=2303.3.0
>>>>>> DISTRIB_CODENAME="Rhyolite"
>>>>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
>>>>>> ```
>>>>>>
>>>>>> ```
>>>>>> # uname -a
>>>>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00
>>> 2019
>>>>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
>>> GNU/Linux
>>>>>> ```
>>>>>> Recently, I found my vms constently being killed due to OOM, and after
>>>>>> digging into the problem, I finally realized that the kernel is
>>> leaking
>>>>>> memory.
>>>>>>
>>>>>> Here's my slabinfo:
>>>>>>
>>>>>> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
>>>>>> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
>>>>>> Active / Total Caches (% used) : 152 / 217 (70.0%)
>>>>>> Active / Total Size (% used) : 5828768.08K / 5996848.72K
>>> (97.2%)
>>>>>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
>>>>>>
>>>>>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
>>>>>>
>>>>>> 80253888 80253888 100% 0.06K 1253967 64 5015868K iommu_iova
>>>
>>> Do you really have a peak demand of ~80 million simultaneous DMA
>>> buffers, or is some driver leaking DMA mappings?
>>>
>>> Robin.
>>>
>>>>>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
>>>>>>
>>>>>> 297444 271112 91% 0.19K 7082 42 56656K dentry
>>>>>>
>>>>>> 254400 252784 99% 0.06K 3975 64 15900K anon_vma_chain
>>>>>>
>>>>>> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
>>>>>>
>>>>>> 202482 201814 99% 0.19K 4821 42 38568K vm_area_struct
>>>>>>
>>>>>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
>>>>>>
>>>>>> 170528 169359 99% 0.25K 5329 32 42632K filp
>>>>>>
>>>>>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
>>>>>>
>>>>>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
>>>>>>
>>>>>> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
>>>>>>
>>>>>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
>>>>>>
>>>>>> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
>>>>>>
>>>>>> 105468 105106 99% 0.04K 1034 102 4136K
>>>>>> selinux_inode_security
>>>>>> 73080 73080 100% 0.13K 2436 30 9744K
>>> kernfs_node_cache
>>>>>>
>>>>>> 72360 70261 97% 0.59K 1340 54 42880K inode_cache
>>>>>>
>>>>>> 71040 71040 100% 0.12K 2220 32 8880K eventpoll_epi
>>>>>>
>>>>>> 68096 59262 87% 0.02K 266 256 1064K kmalloc-16
>>>>>>
>>>>>> 53652 53652 100% 0.04K 526 102 2104K pde_opener
>>>>>>
>>>>>> 50496 31654 62% 2.00K 3156 16 100992K kmalloc-2048
>>>>>>
>>>>>> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
>>>>>>
>>>>>> 44496 43013 96% 0.66K 927 48 29664K
>>> proc_inode_cache
>>>>>>
>>>>>> 44352 44352 100% 0.06K 693 64 2772K
>>> task_delay_info
>>>>>>
>>>>>> 43516 43471 99% 0.69K 946 46 30272K
>>> sock_inode_cache
>>>>>>
>>>>>> 37856 27626 72% 1.00K 1183 32 37856K kmalloc-1024
>>>>>>
>>>>>> 36736 36736 100% 0.07K 656 56 2624K eventpoll_pwq
>>>>>>
>>>>>> 34076 31282 91% 0.57K 1217 28 19472K
>>> radix_tree_node
>>>>>>
>>>>>> 33660 30528 90% 1.05K 1122 30 35904K
>>> ext4_inode_cache
>>>>>>
>>>>>> 32760 30959 94% 0.19K 780 42 6240K kmalloc-192
>>>>>>
>>>>>> 32028 32028 100% 0.04K 314 102 1256K
>>> ext4_extent_status
>>>>>>
>>>>>> 30048 30048 100% 0.25K 939 32 7512K
>>> skbuff_head_cache
>>>>>>
>>>>>> 28736 28736 100% 0.06K 449 64 1796K fs_cache
>>>>>>
>>>>>> 24702 24702 100% 0.69K 537 46 17184K files_cache
>>>>>>
>>>>>> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
>>>>>>
>>>>>> 23104 22945 99% 0.12K 722 32 2888K kmalloc-128
>>>>>>
>>>>>> 22724 21307 93% 0.69K 494 46 15808K
>>> shmem_inode_cache
>>>>>>
>>>>>> 21472 21472 100% 0.12K 671 32 2684K seq_file
>>>>>>
>>>>>> 19904 19904 100% 1.00K 622 32 19904K UNIX
>>>>>>
>>>>>> 17340 17340 100% 1.06K 578 30 18496K mm_struct
>>>>>>
>>>>>> 15980 15980 100% 0.02K 94 170 376K avtab_node
>>>>>>
>>>>>> 14070 14070 100% 1.06K 469 30 15008K signal_cache
>>>>>>
>>>>>> 13248 13248 100% 0.12K 414 32 1656K pid
>>>>>>
>>>>>> 12128 11777 97% 0.25K 379 32 3032K kmalloc-256
>>>>>>
>>>>>> 11008 11008 100% 0.02K 43 256 172K
>>>>>> selinux_file_security
>>>>>> 10812 10812 100% 0.04K 106 102 424K Acpi-Namespace
>>>>>>
>>>>>> These information shows that the 'iommu_iova' is the top memory
>>> consumer.
>>>>>> In order to optimize the network performence of Openstack virtual
>>> machines,
>>>>>> I enabled the vt-d feature in bios and sriov feature of Intel 82599
>>> 10G
>>>>>> NIC. I'm assuming this is the root cause of this issue.
>>>>>>
>>>>>> Is there anything I can do to fix it?
>>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> iommu mailing list
>>>> iommu@lists.linux-foundation.org
>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>>
>>>
>>
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iommu_iova slab eats too much memory
2020-04-24 12:15 ` Robin Murphy
@ 2020-04-24 13:20 ` Bin
2020-04-24 16:30 ` Robin Murphy
0 siblings, 1 reply; 16+ messages in thread
From: Bin @ 2020-04-24 13:20 UTC (permalink / raw)
To: Robin Murphy; +Cc: iommu
[-- Attachment #1.1: Type: text/plain, Size: 7956 bytes --]
Dear Robin:
Thank you for your explanation. Now, I understand that this could be
NIC driver's fault, but how could I confirm it? Do I have to debug the
driver myself?
Robin Murphy <robin.murphy@arm.com> 于2020年4月24日周五 下午8:15写道:
> On 2020-04-24 1:06 pm, Bin wrote:
> > I'm not familiar with the mmu stuff, so what you mean by "some driver
> > leaking DMA mappings", is it possible that some other kernel module like
> > KVM or NIC driver leads to the leaking problem instead of the iommu
> module
> > itself?
>
> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
> should, since I'd expect a lot of people to have noticed that. It's far
> more likely that some driver is failing to call dma_unmap_* when it's
> finished with a buffer - with the IOMMU disabled that would be a no-op
> on x86 with a modern 64-bit-capable device, so such a latent bug could
> have been easily overlooked.
>
> Robin.
>
> > Bin <anole1949@gmail.com> 于 2020年4月24日周五 20:00写道:
> >
> >> Well, that's the problem! I'm assuming the iommu kernel module is
> leaking
> >> memory. But I don't know why and how.
> >>
> >> Do you have any idea about it? Or any further information is needed?
> >>
> >> Robin Murphy <robin.murphy@arm.com> 于 2020年4月24日周五 19:20写道:
> >>
> >>> On 2020-04-24 1:40 am, Bin wrote:
> >>>> Hello? anyone there?
> >>>>
> >>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
> >>>>
> >>>>> Forget to mention, I've already disabled the slab merge, so this is
> >>> what
> >>>>> it is.
> >>>>>
> >>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
> >>>>>
> >>>>>> Hey, guys:
> >>>>>>
> >>>>>> I'm running a batch of CoreOS boxes, the lsb_release is:
> >>>>>>
> >>>>>> ```
> >>>>>> # cat /etc/lsb-release
> >>>>>> DISTRIB_ID="Container Linux by CoreOS"
> >>>>>> DISTRIB_RELEASE=2303.3.0
> >>>>>> DISTRIB_CODENAME="Rhyolite"
> >>>>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
> >>>>>> ```
> >>>>>>
> >>>>>> ```
> >>>>>> # uname -a
> >>>>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00
> >>> 2019
> >>>>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
> >>> GNU/Linux
> >>>>>> ```
> >>>>>> Recently, I found my vms constently being killed due to OOM, and
> after
> >>>>>> digging into the problem, I finally realized that the kernel is
> >>> leaking
> >>>>>> memory.
> >>>>>>
> >>>>>> Here's my slabinfo:
> >>>>>>
> >>>>>> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
> >>>>>> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
> >>>>>> Active / Total Caches (% used) : 152 / 217 (70.0%)
> >>>>>> Active / Total Size (% used) : 5828768.08K / 5996848.72K
> >>> (97.2%)
> >>>>>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
> >>>>>>
> >>>>>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> >>>>>>
> >>>>>> 80253888 80253888 100% 0.06K 1253967 64 5015868K
> iommu_iova
> >>>
> >>> Do you really have a peak demand of ~80 million simultaneous DMA
> >>> buffers, or is some driver leaking DMA mappings?
> >>>
> >>> Robin.
> >>>
> >>>>>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
> >>>>>>
> >>>>>> 297444 271112 91% 0.19K 7082 42 56656K dentry
> >>>>>>
> >>>>>> 254400 252784 99% 0.06K 3975 64 15900K
> anon_vma_chain
> >>>>>>
> >>>>>> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
> >>>>>>
> >>>>>> 202482 201814 99% 0.19K 4821 42 38568K
> vm_area_struct
> >>>>>>
> >>>>>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
> >>>>>>
> >>>>>> 170528 169359 99% 0.25K 5329 32 42632K filp
> >>>>>>
> >>>>>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
> >>>>>>
> >>>>>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
> >>>>>>
> >>>>>> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
> >>>>>>
> >>>>>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
> >>>>>>
> >>>>>> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
> >>>>>>
> >>>>>> 105468 105106 99% 0.04K 1034 102 4136K
> >>>>>> selinux_inode_security
> >>>>>> 73080 73080 100% 0.13K 2436 30 9744K
> >>> kernfs_node_cache
> >>>>>>
> >>>>>> 72360 70261 97% 0.59K 1340 54 42880K inode_cache
> >>>>>>
> >>>>>> 71040 71040 100% 0.12K 2220 32 8880K
> eventpoll_epi
> >>>>>>
> >>>>>> 68096 59262 87% 0.02K 266 256 1064K kmalloc-16
> >>>>>>
> >>>>>> 53652 53652 100% 0.04K 526 102 2104K pde_opener
> >>>>>>
> >>>>>> 50496 31654 62% 2.00K 3156 16 100992K
> kmalloc-2048
> >>>>>>
> >>>>>> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
> >>>>>>
> >>>>>> 44496 43013 96% 0.66K 927 48 29664K
> >>> proc_inode_cache
> >>>>>>
> >>>>>> 44352 44352 100% 0.06K 693 64 2772K
> >>> task_delay_info
> >>>>>>
> >>>>>> 43516 43471 99% 0.69K 946 46 30272K
> >>> sock_inode_cache
> >>>>>>
> >>>>>> 37856 27626 72% 1.00K 1183 32 37856K
> kmalloc-1024
> >>>>>>
> >>>>>> 36736 36736 100% 0.07K 656 56 2624K
> eventpoll_pwq
> >>>>>>
> >>>>>> 34076 31282 91% 0.57K 1217 28 19472K
> >>> radix_tree_node
> >>>>>>
> >>>>>> 33660 30528 90% 1.05K 1122 30 35904K
> >>> ext4_inode_cache
> >>>>>>
> >>>>>> 32760 30959 94% 0.19K 780 42 6240K kmalloc-192
> >>>>>>
> >>>>>> 32028 32028 100% 0.04K 314 102 1256K
> >>> ext4_extent_status
> >>>>>>
> >>>>>> 30048 30048 100% 0.25K 939 32 7512K
> >>> skbuff_head_cache
> >>>>>>
> >>>>>> 28736 28736 100% 0.06K 449 64 1796K fs_cache
> >>>>>>
> >>>>>> 24702 24702 100% 0.69K 537 46 17184K files_cache
> >>>>>>
> >>>>>> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
> >>>>>>
> >>>>>> 23104 22945 99% 0.12K 722 32 2888K kmalloc-128
> >>>>>>
> >>>>>> 22724 21307 93% 0.69K 494 46 15808K
> >>> shmem_inode_cache
> >>>>>>
> >>>>>> 21472 21472 100% 0.12K 671 32 2684K seq_file
> >>>>>>
> >>>>>> 19904 19904 100% 1.00K 622 32 19904K UNIX
> >>>>>>
> >>>>>> 17340 17340 100% 1.06K 578 30 18496K mm_struct
> >>>>>>
> >>>>>> 15980 15980 100% 0.02K 94 170 376K avtab_node
> >>>>>>
> >>>>>> 14070 14070 100% 1.06K 469 30 15008K
> signal_cache
> >>>>>>
> >>>>>> 13248 13248 100% 0.12K 414 32 1656K pid
> >>>>>>
> >>>>>> 12128 11777 97% 0.25K 379 32 3032K kmalloc-256
> >>>>>>
> >>>>>> 11008 11008 100% 0.02K 43 256 172K
> >>>>>> selinux_file_security
> >>>>>> 10812 10812 100% 0.04K 106 102 424K
> Acpi-Namespace
> >>>>>>
> >>>>>> These information shows that the 'iommu_iova' is the top memory
> >>> consumer.
> >>>>>> In order to optimize the network performence of Openstack virtual
> >>> machines,
> >>>>>> I enabled the vt-d feature in bios and sriov feature of Intel 82599
> >>> 10G
> >>>>>> NIC. I'm assuming this is the root cause of this issue.
> >>>>>>
> >>>>>> Is there anything I can do to fix it?
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> iommu mailing list
> >>>> iommu@lists.linux-foundation.org
> >>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >>>>
> >>>
> >>
> >
>
[-- Attachment #1.2: Type: text/html, Size: 12558 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iommu_iova slab eats too much memory
2020-04-24 13:20 ` Bin
@ 2020-04-24 16:30 ` Robin Murphy
2020-04-24 17:49 ` John Garry
2020-04-28 9:17 ` Salil Mehta
0 siblings, 2 replies; 16+ messages in thread
From: Robin Murphy @ 2020-04-24 16:30 UTC (permalink / raw)
To: Bin; +Cc: iommu
On 2020-04-24 2:20 pm, Bin wrote:
> Dear Robin:
> Thank you for your explanation. Now, I understand that this could be
> NIC driver's fault, but how could I confirm it? Do I have to debug the
> driver myself?
I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
memory about an order of magnitude faster than the IOVAs alone, but it
should shed some light on whether DMA API usage looks suspicious, and
dumping the mappings should help track down the responsible driver(s).
Although the debugfs code doesn't show the stacktrace of where each
mapping was made, I guess it would be fairly simple to tweak that for a
quick way to narrow down where to start looking in an offending driver.
Robin.
> Robin Murphy <robin.murphy@arm.com> 于2020年4月24日周五 下午8:15写道:
>
>> On 2020-04-24 1:06 pm, Bin wrote:
>>> I'm not familiar with the mmu stuff, so what you mean by "some driver
>>> leaking DMA mappings", is it possible that some other kernel module like
>>> KVM or NIC driver leads to the leaking problem instead of the iommu
>> module
>>> itself?
>>
>> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
>> should, since I'd expect a lot of people to have noticed that. It's far
>> more likely that some driver is failing to call dma_unmap_* when it's
>> finished with a buffer - with the IOMMU disabled that would be a no-op
>> on x86 with a modern 64-bit-capable device, so such a latent bug could
>> have been easily overlooked.
>>
>> Robin.
>>
>>> Bin <anole1949@gmail.com> 于 2020年4月24日周五 20:00写道:
>>>
>>>> Well, that's the problem! I'm assuming the iommu kernel module is
>> leaking
>>>> memory. But I don't know why and how.
>>>>
>>>> Do you have any idea about it? Or any further information is needed?
>>>>
>>>> Robin Murphy <robin.murphy@arm.com> 于 2020年4月24日周五 19:20写道:
>>>>
>>>>> On 2020-04-24 1:40 am, Bin wrote:
>>>>>> Hello? anyone there?
>>>>>>
>>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
>>>>>>
>>>>>>> Forget to mention, I've already disabled the slab merge, so this is
>>>>> what
>>>>>>> it is.
>>>>>>>
>>>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
>>>>>>>
>>>>>>>> Hey, guys:
>>>>>>>>
>>>>>>>> I'm running a batch of CoreOS boxes, the lsb_release is:
>>>>>>>>
>>>>>>>> ```
>>>>>>>> # cat /etc/lsb-release
>>>>>>>> DISTRIB_ID="Container Linux by CoreOS"
>>>>>>>> DISTRIB_RELEASE=2303.3.0
>>>>>>>> DISTRIB_CODENAME="Rhyolite"
>>>>>>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
>>>>>>>> ```
>>>>>>>>
>>>>>>>> ```
>>>>>>>> # uname -a
>>>>>>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00
>>>>> 2019
>>>>>>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
>>>>> GNU/Linux
>>>>>>>> ```
>>>>>>>> Recently, I found my vms constently being killed due to OOM, and
>> after
>>>>>>>> digging into the problem, I finally realized that the kernel is
>>>>> leaking
>>>>>>>> memory.
>>>>>>>>
>>>>>>>> Here's my slabinfo:
>>>>>>>>
>>>>>>>> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
>>>>>>>> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
>>>>>>>> Active / Total Caches (% used) : 152 / 217 (70.0%)
>>>>>>>> Active / Total Size (% used) : 5828768.08K / 5996848.72K
>>>>> (97.2%)
>>>>>>>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
>>>>>>>>
>>>>>>>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
>>>>>>>>
>>>>>>>> 80253888 80253888 100% 0.06K 1253967 64 5015868K
>> iommu_iova
>>>>>
>>>>> Do you really have a peak demand of ~80 million simultaneous DMA
>>>>> buffers, or is some driver leaking DMA mappings?
>>>>>
>>>>> Robin.
>>>>>
>>>>>>>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
>>>>>>>>
>>>>>>>> 297444 271112 91% 0.19K 7082 42 56656K dentry
>>>>>>>>
>>>>>>>> 254400 252784 99% 0.06K 3975 64 15900K
>> anon_vma_chain
>>>>>>>>
>>>>>>>> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
>>>>>>>>
>>>>>>>> 202482 201814 99% 0.19K 4821 42 38568K
>> vm_area_struct
>>>>>>>>
>>>>>>>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
>>>>>>>>
>>>>>>>> 170528 169359 99% 0.25K 5329 32 42632K filp
>>>>>>>>
>>>>>>>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
>>>>>>>>
>>>>>>>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
>>>>>>>>
>>>>>>>> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
>>>>>>>>
>>>>>>>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
>>>>>>>>
>>>>>>>> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
>>>>>>>>
>>>>>>>> 105468 105106 99% 0.04K 1034 102 4136K
>>>>>>>> selinux_inode_security
>>>>>>>> 73080 73080 100% 0.13K 2436 30 9744K
>>>>> kernfs_node_cache
>>>>>>>>
>>>>>>>> 72360 70261 97% 0.59K 1340 54 42880K inode_cache
>>>>>>>>
>>>>>>>> 71040 71040 100% 0.12K 2220 32 8880K
>> eventpoll_epi
>>>>>>>>
>>>>>>>> 68096 59262 87% 0.02K 266 256 1064K kmalloc-16
>>>>>>>>
>>>>>>>> 53652 53652 100% 0.04K 526 102 2104K pde_opener
>>>>>>>>
>>>>>>>> 50496 31654 62% 2.00K 3156 16 100992K
>> kmalloc-2048
>>>>>>>>
>>>>>>>> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
>>>>>>>>
>>>>>>>> 44496 43013 96% 0.66K 927 48 29664K
>>>>> proc_inode_cache
>>>>>>>>
>>>>>>>> 44352 44352 100% 0.06K 693 64 2772K
>>>>> task_delay_info
>>>>>>>>
>>>>>>>> 43516 43471 99% 0.69K 946 46 30272K
>>>>> sock_inode_cache
>>>>>>>>
>>>>>>>> 37856 27626 72% 1.00K 1183 32 37856K
>> kmalloc-1024
>>>>>>>>
>>>>>>>> 36736 36736 100% 0.07K 656 56 2624K
>> eventpoll_pwq
>>>>>>>>
>>>>>>>> 34076 31282 91% 0.57K 1217 28 19472K
>>>>> radix_tree_node
>>>>>>>>
>>>>>>>> 33660 30528 90% 1.05K 1122 30 35904K
>>>>> ext4_inode_cache
>>>>>>>>
>>>>>>>> 32760 30959 94% 0.19K 780 42 6240K kmalloc-192
>>>>>>>>
>>>>>>>> 32028 32028 100% 0.04K 314 102 1256K
>>>>> ext4_extent_status
>>>>>>>>
>>>>>>>> 30048 30048 100% 0.25K 939 32 7512K
>>>>> skbuff_head_cache
>>>>>>>>
>>>>>>>> 28736 28736 100% 0.06K 449 64 1796K fs_cache
>>>>>>>>
>>>>>>>> 24702 24702 100% 0.69K 537 46 17184K files_cache
>>>>>>>>
>>>>>>>> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
>>>>>>>>
>>>>>>>> 23104 22945 99% 0.12K 722 32 2888K kmalloc-128
>>>>>>>>
>>>>>>>> 22724 21307 93% 0.69K 494 46 15808K
>>>>> shmem_inode_cache
>>>>>>>>
>>>>>>>> 21472 21472 100% 0.12K 671 32 2684K seq_file
>>>>>>>>
>>>>>>>> 19904 19904 100% 1.00K 622 32 19904K UNIX
>>>>>>>>
>>>>>>>> 17340 17340 100% 1.06K 578 30 18496K mm_struct
>>>>>>>>
>>>>>>>> 15980 15980 100% 0.02K 94 170 376K avtab_node
>>>>>>>>
>>>>>>>> 14070 14070 100% 1.06K 469 30 15008K
>> signal_cache
>>>>>>>>
>>>>>>>> 13248 13248 100% 0.12K 414 32 1656K pid
>>>>>>>>
>>>>>>>> 12128 11777 97% 0.25K 379 32 3032K kmalloc-256
>>>>>>>>
>>>>>>>> 11008 11008 100% 0.02K 43 256 172K
>>>>>>>> selinux_file_security
>>>>>>>> 10812 10812 100% 0.04K 106 102 424K
>> Acpi-Namespace
>>>>>>>>
>>>>>>>> These information shows that the 'iommu_iova' is the top memory
>>>>> consumer.
>>>>>>>> In order to optimize the network performence of Openstack virtual
>>>>> machines,
>>>>>>>> I enabled the vt-d feature in bios and sriov feature of Intel 82599
>>>>> 10G
>>>>>>>> NIC. I'm assuming this is the root cause of this issue.
>>>>>>>>
>>>>>>>> Is there anything I can do to fix it?
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> iommu mailing list
>>>>>> iommu@lists.linux-foundation.org
>>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>>>>
>>>>>
>>>>
>>>
>>
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iommu_iova slab eats too much memory
2020-04-24 16:30 ` Robin Murphy
@ 2020-04-24 17:49 ` John Garry
2020-04-25 13:38 ` Bin
2020-04-28 9:17 ` Salil Mehta
1 sibling, 1 reply; 16+ messages in thread
From: John Garry @ 2020-04-24 17:49 UTC (permalink / raw)
To: Robin Murphy, Bin; +Cc: iommu
On 24/04/2020 17:30, Robin Murphy wrote:
> On 2020-04-24 2:20 pm, Bin wrote:
>> Dear Robin:
>> Thank you for your explanation. Now, I understand that this could be
>> NIC driver's fault, but how could I confirm it? Do I have to debug the
>> driver myself?
>
> I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
> memory about an order of magnitude faster than the IOVAs alone, but it
> should shed some light on whether DMA API usage looks suspicious, and
> dumping the mappings should help track down the responsible driver(s).
> Although the debugfs code doesn't show the stacktrace of where each
> mapping was made, I guess it would be fairly simple to tweak that for a
> quick way to narrow down where to start looking in an offending driver.
>
> Robin.
Just mentioning this in case it's relevant - we found long term aging
throughput test causes RB tree to grow very large (and would I assume
eat lots of memory):
https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leizhen@huawei.com/
John
>
>> Robin Murphy <robin.murphy@arm.com> 于2020年4月24日周五 下午8:15写道:
>>
>>> On 2020-04-24 1:06 pm, Bin wrote:
>>>> I'm not familiar with the mmu stuff, so what you mean by "some driver
>>>> leaking DMA mappings", is it possible that some other kernel module like
>>>> KVM or NIC driver leads to the leaking problem instead of the iommu
>>> module
>>>> itself?
>>>
>>> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
>>> should, since I'd expect a lot of people to have noticed that. It's far
>>> more likely that some driver is failing to call dma_unmap_* when it's
>>> finished with a buffer - with the IOMMU disabled that would be a no-op
>>> on x86 with a modern 64-bit-capable device, so such a latent bug could
>>> have been easily overlooked.
>>>
>>> Robin.
>>>
>>>> Bin <anole1949@gmail.com> 于 2020年4月24日周五 20:00写道:
>>>>
>>>>> Well, that's the problem! I'm assuming the iommu kernel module is
>>> leaking
>>>>> memory. But I don't know why and how.
>>>>>
>>>>> Do you have any idea about it? Or any further information is needed?
>>>>>
>>>>> Robin Murphy <robin.murphy@arm.com> 于 2020年4月24日周五 19:20写道:
>>>>>
>>>>>> On 2020-04-24 1:40 am, Bin wrote:
>>>>>>> Hello? anyone there?
>>>>>>>
>>>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
>>>>>>>
>>>>>>>> Forget to mention, I've already disabled the slab merge, so this is
>>>>>> what
>>>>>>>> it is.
>>>>>>>>
>>>>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
>>>>>>>>
>>>>>>>>> Hey, guys:
>>>>>>>>>
>>>>>>>>> I'm running a batch of CoreOS boxes, the lsb_release is:
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> # cat /etc/lsb-release
>>>>>>>>> DISTRIB_ID="Container Linux by CoreOS"
>>>>>>>>> DISTRIB_RELEASE=2303.3.0
>>>>>>>>> DISTRIB_CODENAME="Rhyolite"
>>>>>>>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> # uname -a
>>>>>>>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00
>>>>>> 2019
>>>>>>>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
>>>>>> GNU/Linux
>>>>>>>>> ```
>>>>>>>>> Recently, I found my vms constently being killed due to OOM, and
>>> after
>>>>>>>>> digging into the problem, I finally realized that the kernel is
>>>>>> leaking
>>>>>>>>> memory.
>>>>>>>>>
>>>>>>>>> Here's my slabinfo:
>>>>>>>>>
>>>>>>>>> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
>>>>>>>>> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
>>>>>>>>> Active / Total Caches (% used) : 152 / 217 (70.0%)
>>>>>>>>> Active / Total Size (% used) : 5828768.08K / 5996848.72K
>>>>>> (97.2%)
>>>>>>>>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
>>>>>>>>>
>>>>>>>>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
>>>>>>>>>
>>>>>>>>> 80253888 80253888 100% 0.06K 1253967 64 5015868K
>>> iommu_iova
>>>>>>
>>>>>> Do you really have a peak demand of ~80 million simultaneous DMA
>>>>>> buffers, or is some driver leaking DMA mappings?
>>>>>>
>>>>>> Robin.
>>>>>>
>>>>>>>>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
>>>>>>>>>
>>>>>>>>> 297444 271112 91% 0.19K 7082 42 56656K dentry
>>>>>>>>>
>>>>>>>>> 254400 252784 99% 0.06K 3975 64 15900K
>>> anon_vma_chain
>>>>>>>>>
>>>>>>>>> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
>>>>>>>>>
>>>>>>>>> 202482 201814 99% 0.19K 4821 42 38568K
>>> vm_area_struct
>>>>>>>>>
>>>>>>>>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
>>>>>>>>>
>>>>>>>>> 170528 169359 99% 0.25K 5329 32 42632K filp
>>>>>>>>>
>>>>>>>>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
>>>>>>>>>
>>>>>>>>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
>>>>>>>>>
>>>>>>>>> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
>>>>>>>>>
>>>>>>>>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
>>>>>>>>>
>>>>>>>>> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
>>>>>>>>>
>>>>>>>>> 105468 105106 99% 0.04K 1034 102 4136K
>>>>>>>>> selinux_inode_security
>>>>>>>>> 73080 73080 100% 0.13K 2436 30 9744K
>>>>>> kernfs_node_cache
>>>>>>>>>
>>>>>>>>> 72360 70261 97% 0.59K 1340 54 42880K inode_cache
>>>>>>>>>
>>>>>>>>> 71040 71040 100% 0.12K 2220 32 8880K
>>> eventpoll_epi
>>>>>>>>>
>>>>>>>>> 68096 59262 87% 0.02K 266 256 1064K kmalloc-16
>>>>>>>>>
>>>>>>>>> 53652 53652 100% 0.04K 526 102 2104K pde_opener
>>>>>>>>>
>>>>>>>>> 50496 31654 62% 2.00K 3156 16 100992K
>>> kmalloc-2048
>>>>>>>>>
>>>>>>>>> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
>>>>>>>>>
>>>>>>>>> 44496 43013 96% 0.66K 927 48 29664K
>>>>>> proc_inode_cache
>>>>>>>>>
>>>>>>>>> 44352 44352 100% 0.06K 693 64 2772K
>>>>>> task_delay_info
>>>>>>>>>
>>>>>>>>> 43516 43471 99% 0.69K 946 46 30272K
>>>>>> sock_inode_cache
>>>>>>>>>
>>>>>>>>> 37856 27626 72% 1.00K 1183 32 37856K
>>> kmalloc-1024
>>>>>>>>>
>>>>>>>>> 36736 36736 100% 0.07K 656 56 2624K
>>> eventpoll_pwq
>>>>>>>>>
>>>>>>>>> 34076 31282 91% 0.57K 1217 28 19472K
>>>>>> radix_tree_node
>>>>>>>>>
>>>>>>>>> 33660 30528 90% 1.05K 1122 30 35904K
>>>>>> ext4_inode_cache
>>>>>>>>>
>>>>>>>>> 32760 30959 94% 0.19K 780 42 6240K kmalloc-192
>>>>>>>>>
>>>>>>>>> 32028 32028 100% 0.04K 314 102 1256K
>>>>>> ext4_extent_status
>>>>>>>>>
>>>>>>>>> 30048 30048 100% 0.25K 939 32 7512K
>>>>>> skbuff_head_cache
>>>>>>>>>
>>>>>>>>> 28736 28736 100% 0.06K 449 64 1796K fs_cache
>>>>>>>>>
>>>>>>>>> 24702 24702 100% 0.69K 537 46 17184K files_cache
>>>>>>>>>
>>>>>>>>> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
>>>>>>>>>
>>>>>>>>> 23104 22945 99% 0.12K 722 32 2888K kmalloc-128
>>>>>>>>>
>>>>>>>>> 22724 21307 93% 0.69K 494 46 15808K
>>>>>> shmem_inode_cache
>>>>>>>>>
>>>>>>>>> 21472 21472 100% 0.12K 671 32 2684K seq_file
>>>>>>>>>
>>>>>>>>> 19904 19904 100% 1.00K 622 32 19904K UNIX
>>>>>>>>>
>>>>>>>>> 17340 17340 100% 1.06K 578 30 18496K mm_struct
>>>>>>>>>
>>>>>>>>> 15980 15980 100% 0.02K 94 170 376K avtab_node
>>>>>>>>>
>>>>>>>>> 14070 14070 100% 1.06K 469 30 15008K
>>> signal_cache
>>>>>>>>>
>>>>>>>>> 13248 13248 100% 0.12K 414 32 1656K pid
>>>>>>>>>
>>>>>>>>> 12128 11777 97% 0.25K 379 32 3032K kmalloc-256
>>>>>>>>>
>>>>>>>>> 11008 11008 100% 0.02K 43 256 172K
>>>>>>>>> selinux_file_security
>>>>>>>>> 10812 10812 100% 0.04K 106 102 424K
>>> Acpi-Namespace
>>>>>>>>>
>>>>>>>>> These information shows that the 'iommu_iova' is the top memory
>>>>>> consumer.
>>>>>>>>> In order to optimize the network performence of Openstack virtual
>>>>>> machines,
>>>>>>>>> I enabled the vt-d feature in bios and sriov feature of Intel 82599
>>>>>> 10G
>>>>>>>>> NIC. I'm assuming this is the root cause of this issue.
>>>>>>>>>
>>>>>>>>> Is there anything I can do to fix it?
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> iommu mailing list
>>>>>>> iommu@lists.linux-foundation.org
>>>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iommu_iova slab eats too much memory
2020-04-24 17:49 ` John Garry
@ 2020-04-25 13:38 ` Bin
0 siblings, 0 replies; 16+ messages in thread
From: Bin @ 2020-04-25 13:38 UTC (permalink / raw)
To: John Garry; +Cc: iommu, Robin Murphy
[-- Attachment #1.1: Type: text/plain, Size: 10374 bytes --]
Dear John:
Thank you for your reply. The case you mentioned is a typical
performance regression issue, there's no need for the kernel to oom kill
any random process even in the worst case. But in my observations, the
iommu_iova slab could consume up to 40G memory, and the kernel have to kill
my vm process to free memory (64G memory installed). So I don't think it's
relevent.
John Garry <john.garry@huawei.com> 于2020年4月25日周六 上午1:50写道:
> On 24/04/2020 17:30, Robin Murphy wrote:
> > On 2020-04-24 2:20 pm, Bin wrote:
> >> Dear Robin:
> >> Thank you for your explanation. Now, I understand that this could
> be
> >> NIC driver's fault, but how could I confirm it? Do I have to debug the
> >> driver myself?
> >
> > I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
> > memory about an order of magnitude faster than the IOVAs alone, but it
> > should shed some light on whether DMA API usage looks suspicious, and
> > dumping the mappings should help track down the responsible driver(s).
> > Although the debugfs code doesn't show the stacktrace of where each
> > mapping was made, I guess it would be fairly simple to tweak that for a
> > quick way to narrow down where to start looking in an offending driver.
> >
> > Robin.
>
> Just mentioning this in case it's relevant - we found long term aging
> throughput test causes RB tree to grow very large (and would I assume
> eat lots of memory):
>
>
> https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leizhen@huawei.com/
>
> John
>
> >
> >> Robin Murphy <robin.murphy@arm.com> 于2020年4月24日周五 下午8:15写道:
> >>
> >>> On 2020-04-24 1:06 pm, Bin wrote:
> >>>> I'm not familiar with the mmu stuff, so what you mean by "some driver
> >>>> leaking DMA mappings", is it possible that some other kernel module
> like
> >>>> KVM or NIC driver leads to the leaking problem instead of the iommu
> >>> module
> >>>> itself?
> >>>
> >>> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
> >>> should, since I'd expect a lot of people to have noticed that. It's far
> >>> more likely that some driver is failing to call dma_unmap_* when it's
> >>> finished with a buffer - with the IOMMU disabled that would be a no-op
> >>> on x86 with a modern 64-bit-capable device, so such a latent bug could
> >>> have been easily overlooked.
> >>>
> >>> Robin.
> >>>
> >>>> Bin <anole1949@gmail.com> 于 2020年4月24日周五 20:00写道:
> >>>>
> >>>>> Well, that's the problem! I'm assuming the iommu kernel module is
> >>> leaking
> >>>>> memory. But I don't know why and how.
> >>>>>
> >>>>> Do you have any idea about it? Or any further information is needed?
> >>>>>
> >>>>> Robin Murphy <robin.murphy@arm.com> 于 2020年4月24日周五 19:20写道:
> >>>>>
> >>>>>> On 2020-04-24 1:40 am, Bin wrote:
> >>>>>>> Hello? anyone there?
> >>>>>>>
> >>>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
> >>>>>>>
> >>>>>>>> Forget to mention, I've already disabled the slab merge, so this
> is
> >>>>>> what
> >>>>>>>> it is.
> >>>>>>>>
> >>>>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
> >>>>>>>>
> >>>>>>>>> Hey, guys:
> >>>>>>>>>
> >>>>>>>>> I'm running a batch of CoreOS boxes, the lsb_release is:
> >>>>>>>>>
> >>>>>>>>> ```
> >>>>>>>>> # cat /etc/lsb-release
> >>>>>>>>> DISTRIB_ID="Container Linux by CoreOS"
> >>>>>>>>> DISTRIB_RELEASE=2303.3.0
> >>>>>>>>> DISTRIB_CODENAME="Rhyolite"
> >>>>>>>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0
> (Rhyolite)"
> >>>>>>>>> ```
> >>>>>>>>>
> >>>>>>>>> ```
> >>>>>>>>> # uname -a
> >>>>>>>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38
> -00
> >>>>>> 2019
> >>>>>>>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
> >>>>>> GNU/Linux
> >>>>>>>>> ```
> >>>>>>>>> Recently, I found my vms constently being killed due to OOM, and
> >>> after
> >>>>>>>>> digging into the problem, I finally realized that the kernel is
> >>>>>> leaking
> >>>>>>>>> memory.
> >>>>>>>>>
> >>>>>>>>> Here's my slabinfo:
> >>>>>>>>>
> >>>>>>>>> Active / Total Objects (% used) : 83818306 / 84191607
> (99.6%)
> >>>>>>>>> Active / Total Slabs (% used) : 1336293 / 1336293
> (100.0%)
> >>>>>>>>> Active / Total Caches (% used) : 152 / 217 (70.0%)
> >>>>>>>>> Active / Total Size (% used) : 5828768.08K /
> 5996848.72K
> >>>>>> (97.2%)
> >>>>>>>>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
> >>>>>>>>>
> >>>>>>>>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> >>>>>>>>>
> >>>>>>>>> 80253888 80253888 100% 0.06K 1253967 64 5015868K
> >>> iommu_iova
> >>>>>>
> >>>>>> Do you really have a peak demand of ~80 million simultaneous DMA
> >>>>>> buffers, or is some driver leaking DMA mappings?
> >>>>>>
> >>>>>> Robin.
> >>>>>>
> >>>>>>>>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
> >>>>>>>>>
> >>>>>>>>> 297444 271112 91% 0.19K 7082 42 56656K dentry
> >>>>>>>>>
> >>>>>>>>> 254400 252784 99% 0.06K 3975 64 15900K
> >>> anon_vma_chain
> >>>>>>>>>
> >>>>>>>>> 222528 39255 17% 0.50K 6954 32 111264K
> kmalloc-512
> >>>>>>>>>
> >>>>>>>>> 202482 201814 99% 0.19K 4821 42 38568K
> >>> vm_area_struct
> >>>>>>>>>
> >>>>>>>>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
> >>>>>>>>>
> >>>>>>>>> 170528 169359 99% 0.25K 5329 32 42632K filp
> >>>>>>>>>
> >>>>>>>>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
> >>>>>>>>>
> >>>>>>>>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
> >>>>>>>>>
> >>>>>>>>> 146640 143123 97% 0.10K 3760 39 15040K
> buffer_head
> >>>>>>>>>
> >>>>>>>>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
> >>>>>>>>>
> >>>>>>>>> 129752 129752 100% 0.07K 2317 56 9268K
> Acpi-Operand
> >>>>>>>>>
> >>>>>>>>> 105468 105106 99% 0.04K 1034 102 4136K
> >>>>>>>>> selinux_inode_security
> >>>>>>>>> 73080 73080 100% 0.13K 2436 30 9744K
> >>>>>> kernfs_node_cache
> >>>>>>>>>
> >>>>>>>>> 72360 70261 97% 0.59K 1340 54 42880K
> inode_cache
> >>>>>>>>>
> >>>>>>>>> 71040 71040 100% 0.12K 2220 32 8880K
> >>> eventpoll_epi
> >>>>>>>>>
> >>>>>>>>> 68096 59262 87% 0.02K 266 256 1064K
> kmalloc-16
> >>>>>>>>>
> >>>>>>>>> 53652 53652 100% 0.04K 526 102 2104K
> pde_opener
> >>>>>>>>>
> >>>>>>>>> 50496 31654 62% 2.00K 3156 16 100992K
> >>> kmalloc-2048
> >>>>>>>>>
> >>>>>>>>> 46242 46242 100% 0.19K 1101 42 8808K
> cred_jar
> >>>>>>>>>
> >>>>>>>>> 44496 43013 96% 0.66K 927 48 29664K
> >>>>>> proc_inode_cache
> >>>>>>>>>
> >>>>>>>>> 44352 44352 100% 0.06K 693 64 2772K
> >>>>>> task_delay_info
> >>>>>>>>>
> >>>>>>>>> 43516 43471 99% 0.69K 946 46 30272K
> >>>>>> sock_inode_cache
> >>>>>>>>>
> >>>>>>>>> 37856 27626 72% 1.00K 1183 32 37856K
> >>> kmalloc-1024
> >>>>>>>>>
> >>>>>>>>> 36736 36736 100% 0.07K 656 56 2624K
> >>> eventpoll_pwq
> >>>>>>>>>
> >>>>>>>>> 34076 31282 91% 0.57K 1217 28 19472K
> >>>>>> radix_tree_node
> >>>>>>>>>
> >>>>>>>>> 33660 30528 90% 1.05K 1122 30 35904K
> >>>>>> ext4_inode_cache
> >>>>>>>>>
> >>>>>>>>> 32760 30959 94% 0.19K 780 42 6240K
> kmalloc-192
> >>>>>>>>>
> >>>>>>>>> 32028 32028 100% 0.04K 314 102 1256K
> >>>>>> ext4_extent_status
> >>>>>>>>>
> >>>>>>>>> 30048 30048 100% 0.25K 939 32 7512K
> >>>>>> skbuff_head_cache
> >>>>>>>>>
> >>>>>>>>> 28736 28736 100% 0.06K 449 64 1796K
> fs_cache
> >>>>>>>>>
> >>>>>>>>> 24702 24702 100% 0.69K 537 46 17184K
> files_cache
> >>>>>>>>>
> >>>>>>>>> 23808 23808 100% 0.66K 496 48 15872K
> ovl_inode
> >>>>>>>>>
> >>>>>>>>> 23104 22945 99% 0.12K 722 32 2888K
> kmalloc-128
> >>>>>>>>>
> >>>>>>>>> 22724 21307 93% 0.69K 494 46 15808K
> >>>>>> shmem_inode_cache
> >>>>>>>>>
> >>>>>>>>> 21472 21472 100% 0.12K 671 32 2684K
> seq_file
> >>>>>>>>>
> >>>>>>>>> 19904 19904 100% 1.00K 622 32 19904K UNIX
> >>>>>>>>>
> >>>>>>>>> 17340 17340 100% 1.06K 578 30 18496K
> mm_struct
> >>>>>>>>>
> >>>>>>>>> 15980 15980 100% 0.02K 94 170 376K
> avtab_node
> >>>>>>>>>
> >>>>>>>>> 14070 14070 100% 1.06K 469 30 15008K
> >>> signal_cache
> >>>>>>>>>
> >>>>>>>>> 13248 13248 100% 0.12K 414 32 1656K pid
> >>>>>>>>>
> >>>>>>>>> 12128 11777 97% 0.25K 379 32 3032K
> kmalloc-256
> >>>>>>>>>
> >>>>>>>>> 11008 11008 100% 0.02K 43 256 172K
> >>>>>>>>> selinux_file_security
> >>>>>>>>> 10812 10812 100% 0.04K 106 102 424K
> >>> Acpi-Namespace
> >>>>>>>>>
> >>>>>>>>> These information shows that the 'iommu_iova' is the top memory
> >>>>>> consumer.
> >>>>>>>>> In order to optimize the network performence of Openstack virtual
> >>>>>> machines,
> >>>>>>>>> I enabled the vt-d feature in bios and sriov feature of Intel
> 82599
> >>>>>> 10G
> >>>>>>>>> NIC. I'm assuming this is the root cause of this issue.
> >>>>>>>>>
> >>>>>>>>> Is there anything I can do to fix it?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> iommu mailing list
> >>>>>>> iommu@lists.linux-foundation.org
> >>>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> > _______________________________________________
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >
>
>
[-- Attachment #1.2: Type: text/html, Size: 17449 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: iommu_iova slab eats too much memory
2020-04-24 16:30 ` Robin Murphy
2020-04-24 17:49 ` John Garry
@ 2020-04-28 9:17 ` Salil Mehta
2020-04-29 4:13 ` Bin
2020-04-29 13:37 ` Salil Mehta
1 sibling, 2 replies; 16+ messages in thread
From: Salil Mehta @ 2020-04-28 9:17 UTC (permalink / raw)
To: Robin Murphy, Bin; +Cc: iommu
Hi Bin,
Few questions:
1. If there is a leak of IOVA due to dma_unmap_* not being called somewhere then
at certain point the throughput will drastically fall and will almost become equal
to zero. This should be due to unavailability of the mapping anymore. But in your
case VM is getting killed so this could be actual DMA buffer leak not DMA mapping
leak. I doubt VM will get killed due to exhaustion of the DMA mappings in the IOMMU
Layer for a transient reason or even due to mapping/unmapping leak.
2. Could you check if you have TSO offload enabled on Intel 82599? It will help
in reducing the number of mappings and will take off IOVA mapping pressure from
the IOMMU/VT-d? Though I am not sure it will help in reducing the amount of memory
required for the buffers.
3. Also, have you checked the cpu-usage while your experiment is going on?
Thanks
Salil.
> -----Original Message-----
> From: iommu [mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of
> Robin Murphy
> Sent: Friday, April 24, 2020 5:31 PM
> To: Bin <anole1949@gmail.com>
> Cc: iommu@lists.linux-foundation.org
> Subject: Re: iommu_iova slab eats too much memory
>
> On 2020-04-24 2:20 pm, Bin wrote:
> > Dear Robin:
> > Thank you for your explanation. Now, I understand that this could be
> > NIC driver's fault, but how could I confirm it? Do I have to debug the
> > driver myself?
>
> I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
> memory about an order of magnitude faster than the IOVAs alone, but it
> should shed some light on whether DMA API usage looks suspicious, and
> dumping the mappings should help track down the responsible driver(s).
> Although the debugfs code doesn't show the stacktrace of where each
> mapping was made, I guess it would be fairly simple to tweak that for a
> quick way to narrow down where to start looking in an offending driver.
>
> Robin.
>
> > Robin Murphy <robin.murphy@arm.com> 于2020年4月24日周五 下午8:15写道:
> >
> >> On 2020-04-24 1:06 pm, Bin wrote:
> >>> I'm not familiar with the mmu stuff, so what you mean by "some driver
> >>> leaking DMA mappings", is it possible that some other kernel module like
> >>> KVM or NIC driver leads to the leaking problem instead of the iommu
> >> module
> >>> itself?
> >>
> >> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
> >> should, since I'd expect a lot of people to have noticed that. It's far
> >> more likely that some driver is failing to call dma_unmap_* when it's
> >> finished with a buffer - with the IOMMU disabled that would be a no-op
> >> on x86 with a modern 64-bit-capable device, so such a latent bug could
> >> have been easily overlooked.
> >>
> >> Robin.
> >>
> >>> Bin <anole1949@gmail.com> 于 2020年4月24日周五 20:00写道:
> >>>
> >>>> Well, that's the problem! I'm assuming the iommu kernel module is
> >> leaking
> >>>> memory. But I don't know why and how.
> >>>>
> >>>> Do you have any idea about it? Or any further information is needed?
> >>>>
> >>>> Robin Murphy <robin.murphy@arm.com> 于 2020年4月24日周五 19:20写道:
> >>>>
> >>>>> On 2020-04-24 1:40 am, Bin wrote:
> >>>>>> Hello? anyone there?
> >>>>>>
> >>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
> >>>>>>
> >>>>>>> Forget to mention, I've already disabled the slab merge, so this is
> >>>>> what
> >>>>>>> it is.
> >>>>>>>
> >>>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
> >>>>>>>
> >>>>>>>> Hey, guys:
> >>>>>>>>
> >>>>>>>> I'm running a batch of CoreOS boxes, the lsb_release is:
> >>>>>>>>
> >>>>>>>> ```
> >>>>>>>> # cat /etc/lsb-release
> >>>>>>>> DISTRIB_ID="Container Linux by CoreOS"
> >>>>>>>> DISTRIB_RELEASE=2303.3.0
> >>>>>>>> DISTRIB_CODENAME="Rhyolite"
> >>>>>>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
> >>>>>>>> ```
> >>>>>>>>
> >>>>>>>> ```
> >>>>>>>> # uname -a
> >>>>>>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00
> >>>>> 2019
> >>>>>>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
> >>>>> GNU/Linux
> >>>>>>>> ```
> >>>>>>>> Recently, I found my vms constently being killed due to OOM, and
> >> after
> >>>>>>>> digging into the problem, I finally realized that the kernel is
> >>>>> leaking
> >>>>>>>> memory.
> >>>>>>>>
> >>>>>>>> Here's my slabinfo:
> >>>>>>>>
> >>>>>>>> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
> >>>>>>>> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
> >>>>>>>> Active / Total Caches (% used) : 152 / 217 (70.0%)
> >>>>>>>> Active / Total Size (% used) : 5828768.08K / 5996848.72K
> >>>>> (97.2%)
> >>>>>>>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
> >>>>>>>>
> >>>>>>>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> >>>>>>>>
> >>>>>>>> 80253888 80253888 100% 0.06K 1253967 64 5015868K
> >> iommu_iova
> >>>>>
> >>>>> Do you really have a peak demand of ~80 million simultaneous DMA
> >>>>> buffers, or is some driver leaking DMA mappings?
> >>>>>
> >>>>> Robin.
> >>>>>
> >>>>>>>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
> >>>>>>>>
> >>>>>>>> 297444 271112 91% 0.19K 7082 42 56656K dentry
> >>>>>>>>
> >>>>>>>> 254400 252784 99% 0.06K 3975 64 15900K
> >> anon_vma_chain
> >>>>>>>>
> >>>>>>>> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
> >>>>>>>>
> >>>>>>>> 202482 201814 99% 0.19K 4821 42 38568K
> >> vm_area_struct
> >>>>>>>>
> >>>>>>>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
> >>>>>>>>
> >>>>>>>> 170528 169359 99% 0.25K 5329 32 42632K filp
> >>>>>>>>
> >>>>>>>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
> >>>>>>>>
> >>>>>>>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
> >>>>>>>>
> >>>>>>>> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
> >>>>>>>>
> >>>>>>>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
> >>>>>>>>
> >>>>>>>> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
> >>>>>>>>
> >>>>>>>> 105468 105106 99% 0.04K 1034 102 4136K
> >>>>>>>> selinux_inode_security
> >>>>>>>> 73080 73080 100% 0.13K 2436 30 9744K
> >>>>> kernfs_node_cache
> >>>>>>>>
> >>>>>>>> 72360 70261 97% 0.59K 1340 54 42880K inode_cache
> >>>>>>>>
> >>>>>>>> 71040 71040 100% 0.12K 2220 32 8880K
> >> eventpoll_epi
> >>>>>>>>
> >>>>>>>> 68096 59262 87% 0.02K 266 256 1064K kmalloc-16
> >>>>>>>>
> >>>>>>>> 53652 53652 100% 0.04K 526 102 2104K pde_opener
> >>>>>>>>
> >>>>>>>> 50496 31654 62% 2.00K 3156 16 100992K
> >> kmalloc-2048
> >>>>>>>>
> >>>>>>>> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
> >>>>>>>>
> >>>>>>>> 44496 43013 96% 0.66K 927 48 29664K
> >>>>> proc_inode_cache
> >>>>>>>>
> >>>>>>>> 44352 44352 100% 0.06K 693 64 2772K
> >>>>> task_delay_info
> >>>>>>>>
> >>>>>>>> 43516 43471 99% 0.69K 946 46 30272K
> >>>>> sock_inode_cache
> >>>>>>>>
> >>>>>>>> 37856 27626 72% 1.00K 1183 32 37856K
> >> kmalloc-1024
> >>>>>>>>
> >>>>>>>> 36736 36736 100% 0.07K 656 56 2624K
> >> eventpoll_pwq
> >>>>>>>>
> >>>>>>>> 34076 31282 91% 0.57K 1217 28 19472K
> >>>>> radix_tree_node
> >>>>>>>>
> >>>>>>>> 33660 30528 90% 1.05K 1122 30 35904K
> >>>>> ext4_inode_cache
> >>>>>>>>
> >>>>>>>> 32760 30959 94% 0.19K 780 42 6240K kmalloc-192
> >>>>>>>>
> >>>>>>>> 32028 32028 100% 0.04K 314 102 1256K
> >>>>> ext4_extent_status
> >>>>>>>>
> >>>>>>>> 30048 30048 100% 0.25K 939 32 7512K
> >>>>> skbuff_head_cache
> >>>>>>>>
> >>>>>>>> 28736 28736 100% 0.06K 449 64 1796K fs_cache
> >>>>>>>>
> >>>>>>>> 24702 24702 100% 0.69K 537 46 17184K files_cache
> >>>>>>>>
> >>>>>>>> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
> >>>>>>>>
> >>>>>>>> 23104 22945 99% 0.12K 722 32 2888K kmalloc-128
> >>>>>>>>
> >>>>>>>> 22724 21307 93% 0.69K 494 46 15808K
> >>>>> shmem_inode_cache
> >>>>>>>>
> >>>>>>>> 21472 21472 100% 0.12K 671 32 2684K seq_file
> >>>>>>>>
> >>>>>>>> 19904 19904 100% 1.00K 622 32 19904K UNIX
> >>>>>>>>
> >>>>>>>> 17340 17340 100% 1.06K 578 30 18496K mm_struct
> >>>>>>>>
> >>>>>>>> 15980 15980 100% 0.02K 94 170 376K avtab_node
> >>>>>>>>
> >>>>>>>> 14070 14070 100% 1.06K 469 30 15008K
> >> signal_cache
> >>>>>>>>
> >>>>>>>> 13248 13248 100% 0.12K 414 32 1656K pid
> >>>>>>>>
> >>>>>>>> 12128 11777 97% 0.25K 379 32 3032K kmalloc-256
> >>>>>>>>
> >>>>>>>> 11008 11008 100% 0.02K 43 256 172K
> >>>>>>>> selinux_file_security
> >>>>>>>> 10812 10812 100% 0.04K 106 102 424K
> >> Acpi-Namespace
> >>>>>>>>
> >>>>>>>> These information shows that the 'iommu_iova' is the top memory
> >>>>> consumer.
> >>>>>>>> In order to optimize the network performence of Openstack virtual
> >>>>> machines,
> >>>>>>>> I enabled the vt-d feature in bios and sriov feature of Intel 82599
> >>>>> 10G
> >>>>>>>> NIC. I'm assuming this is the root cause of this issue.
> >>>>>>>>
> >>>>>>>> Is there anything I can do to fix it?
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> iommu mailing list
> >>>>>> iommu@lists.linux-foundation.org
> >>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iommu_iova slab eats too much memory
2020-04-28 9:17 ` Salil Mehta
@ 2020-04-29 4:13 ` Bin
2020-04-29 13:37 ` Salil Mehta
1 sibling, 0 replies; 16+ messages in thread
From: Bin @ 2020-04-29 4:13 UTC (permalink / raw)
To: Salil Mehta; +Cc: iommu, Robin Murphy
[-- Attachment #1.1: Type: text/plain, Size: 11769 bytes --]
Hi Shlil:
Thank you for your attention, and these are my answers:
1. I don't really understand what you're saying. What's the difference
between DMA buffer and DMA mapping?
It's like a memory block pool and a memory block or something like that?
2. Yes, the TSO is enabled all the time, but it seems not helping.
3. The CPU usage is pretty normal, and what's the point of this question?
Is it relevant to the leaking problem?
FYI:
I found an interesting phenomenon that it's just a small part of the
running hosts has this issue, even though they all
have the same kernel, configuration and hardwares, I don't know if this
really mean something.
Salil Mehta <salil.mehta@huawei.com> 于2020年4月28日周二 下午5:17写道:
> Hi Bin,
>
> Few questions:
>
> 1. If there is a leak of IOVA due to dma_unmap_* not being called
> somewhere then
> at certain point the throughput will drastically fall and will almost
> become equal
> to zero. This should be due to unavailability of the mapping anymore. But
> in your
> case VM is getting killed so this could be actual DMA buffer leak not DMA
> mapping
> leak. I doubt VM will get killed due to exhaustion of the DMA mappings in
> the IOMMU
> Layer for a transient reason or even due to mapping/unmapping leak.
>
> 2. Could you check if you have TSO offload enabled on Intel 82599? It will
> help
> in reducing the number of mappings and will take off IOVA mapping pressure
> from
> the IOMMU/VT-d? Though I am not sure it will help in reducing the amount
> of memory
> required for the buffers.
>
> 3. Also, have you checked the cpu-usage while your experiment is going on?
>
> Thanks
> Salil.
>
> > -----Original Message-----
> > From: iommu [mailto:iommu-bounces@lists.linux-foundation.org] On Behalf
> Of
> > Robin Murphy
> > Sent: Friday, April 24, 2020 5:31 PM
> > To: Bin <anole1949@gmail.com>
> > Cc: iommu@lists.linux-foundation.org
> > Subject: Re: iommu_iova slab eats too much memory
> >
> > On 2020-04-24 2:20 pm, Bin wrote:
> > > Dear Robin:
> > > Thank you for your explanation. Now, I understand that this could
> be
> > > NIC driver's fault, but how could I confirm it? Do I have to debug the
> > > driver myself?
> >
> > I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
> > memory about an order of magnitude faster than the IOVAs alone, but it
> > should shed some light on whether DMA API usage looks suspicious, and
> > dumping the mappings should help track down the responsible driver(s).
> > Although the debugfs code doesn't show the stacktrace of where each
> > mapping was made, I guess it would be fairly simple to tweak that for a
> > quick way to narrow down where to start looking in an offending driver.
> >
> > Robin.
> >
> > > Robin Murphy <robin.murphy@arm.com> 于2020年4月24日周五 下午8:15写道:
> > >
> > >> On 2020-04-24 1:06 pm, Bin wrote:
> > >>> I'm not familiar with the mmu stuff, so what you mean by "some driver
> > >>> leaking DMA mappings", is it possible that some other kernel module
> like
> > >>> KVM or NIC driver leads to the leaking problem instead of the iommu
> > >> module
> > >>> itself?
> > >>
> > >> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
> > >> should, since I'd expect a lot of people to have noticed that. It's
> far
> > >> more likely that some driver is failing to call dma_unmap_* when it's
> > >> finished with a buffer - with the IOMMU disabled that would be a no-op
> > >> on x86 with a modern 64-bit-capable device, so such a latent bug could
> > >> have been easily overlooked.
> > >>
> > >> Robin.
> > >>
> > >>> Bin <anole1949@gmail.com> 于 2020年4月24日周五 20:00写道:
> > >>>
> > >>>> Well, that's the problem! I'm assuming the iommu kernel module is
> > >> leaking
> > >>>> memory. But I don't know why and how.
> > >>>>
> > >>>> Do you have any idea about it? Or any further information is needed?
> > >>>>
> > >>>> Robin Murphy <robin.murphy@arm.com> 于 2020年4月24日周五 19:20写道:
> > >>>>
> > >>>>> On 2020-04-24 1:40 am, Bin wrote:
> > >>>>>> Hello? anyone there?
> > >>>>>>
> > >>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
> > >>>>>>
> > >>>>>>> Forget to mention, I've already disabled the slab merge, so this
> is
> > >>>>> what
> > >>>>>>> it is.
> > >>>>>>>
> > >>>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
> > >>>>>>>
> > >>>>>>>> Hey, guys:
> > >>>>>>>>
> > >>>>>>>> I'm running a batch of CoreOS boxes, the lsb_release is:
> > >>>>>>>>
> > >>>>>>>> ```
> > >>>>>>>> # cat /etc/lsb-release
> > >>>>>>>> DISTRIB_ID="Container Linux by CoreOS"
> > >>>>>>>> DISTRIB_RELEASE=2303.3.0
> > >>>>>>>> DISTRIB_CODENAME="Rhyolite"
> > >>>>>>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0
> (Rhyolite)"
> > >>>>>>>> ```
> > >>>>>>>>
> > >>>>>>>> ```
> > >>>>>>>> # uname -a
> > >>>>>>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38
> -00
> > >>>>> 2019
> > >>>>>>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
> > >>>>> GNU/Linux
> > >>>>>>>> ```
> > >>>>>>>> Recently, I found my vms constently being killed due to OOM, and
> > >> after
> > >>>>>>>> digging into the problem, I finally realized that the kernel is
> > >>>>> leaking
> > >>>>>>>> memory.
> > >>>>>>>>
> > >>>>>>>> Here's my slabinfo:
> > >>>>>>>>
> > >>>>>>>> Active / Total Objects (% used) : 83818306 / 84191607
> (99.6%)
> > >>>>>>>> Active / Total Slabs (% used) : 1336293 / 1336293
> (100.0%)
> > >>>>>>>> Active / Total Caches (% used) : 152 / 217 (70.0%)
> > >>>>>>>> Active / Total Size (% used) : 5828768.08K /
> 5996848.72K
> > >>>>> (97.2%)
> > >>>>>>>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
> > >>>>>>>>
> > >>>>>>>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> > >>>>>>>>
> > >>>>>>>> 80253888 80253888 100% 0.06K 1253967 64 5015868K
> > >> iommu_iova
> > >>>>>
> > >>>>> Do you really have a peak demand of ~80 million simultaneous DMA
> > >>>>> buffers, or is some driver leaking DMA mappings?
> > >>>>>
> > >>>>> Robin.
> > >>>>>
> > >>>>>>>> 489472 489123 99% 0.03K 3824 128 15296K
> kmalloc-32
> > >>>>>>>>
> > >>>>>>>> 297444 271112 91% 0.19K 7082 42 56656K dentry
> > >>>>>>>>
> > >>>>>>>> 254400 252784 99% 0.06K 3975 64 15900K
> > >> anon_vma_chain
> > >>>>>>>>
> > >>>>>>>> 222528 39255 17% 0.50K 6954 32 111264K
> kmalloc-512
> > >>>>>>>>
> > >>>>>>>> 202482 201814 99% 0.19K 4821 42 38568K
> > >> vm_area_struct
> > >>>>>>>>
> > >>>>>>>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
> > >>>>>>>>
> > >>>>>>>> 170528 169359 99% 0.25K 5329 32 42632K filp
> > >>>>>>>>
> > >>>>>>>> 158144 153508 97% 0.06K 2471 64 9884K
> kmalloc-64
> > >>>>>>>>
> > >>>>>>>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
> > >>>>>>>>
> > >>>>>>>> 146640 143123 97% 0.10K 3760 39 15040K
> buffer_head
> > >>>>>>>>
> > >>>>>>>> 130368 32791 25% 0.09K 3104 42 12416K
> kmalloc-96
> > >>>>>>>>
> > >>>>>>>> 129752 129752 100% 0.07K 2317 56 9268K
> Acpi-Operand
> > >>>>>>>>
> > >>>>>>>> 105468 105106 99% 0.04K 1034 102 4136K
> > >>>>>>>> selinux_inode_security
> > >>>>>>>> 73080 73080 100% 0.13K 2436 30 9744K
> > >>>>> kernfs_node_cache
> > >>>>>>>>
> > >>>>>>>> 72360 70261 97% 0.59K 1340 54 42880K
> inode_cache
> > >>>>>>>>
> > >>>>>>>> 71040 71040 100% 0.12K 2220 32 8880K
> > >> eventpoll_epi
> > >>>>>>>>
> > >>>>>>>> 68096 59262 87% 0.02K 266 256 1064K
> kmalloc-16
> > >>>>>>>>
> > >>>>>>>> 53652 53652 100% 0.04K 526 102 2104K
> pde_opener
> > >>>>>>>>
> > >>>>>>>> 50496 31654 62% 2.00K 3156 16 100992K
> > >> kmalloc-2048
> > >>>>>>>>
> > >>>>>>>> 46242 46242 100% 0.19K 1101 42 8808K
> cred_jar
> > >>>>>>>>
> > >>>>>>>> 44496 43013 96% 0.66K 927 48 29664K
> > >>>>> proc_inode_cache
> > >>>>>>>>
> > >>>>>>>> 44352 44352 100% 0.06K 693 64 2772K
> > >>>>> task_delay_info
> > >>>>>>>>
> > >>>>>>>> 43516 43471 99% 0.69K 946 46 30272K
> > >>>>> sock_inode_cache
> > >>>>>>>>
> > >>>>>>>> 37856 27626 72% 1.00K 1183 32 37856K
> > >> kmalloc-1024
> > >>>>>>>>
> > >>>>>>>> 36736 36736 100% 0.07K 656 56 2624K
> > >> eventpoll_pwq
> > >>>>>>>>
> > >>>>>>>> 34076 31282 91% 0.57K 1217 28 19472K
> > >>>>> radix_tree_node
> > >>>>>>>>
> > >>>>>>>> 33660 30528 90% 1.05K 1122 30 35904K
> > >>>>> ext4_inode_cache
> > >>>>>>>>
> > >>>>>>>> 32760 30959 94% 0.19K 780 42 6240K
> kmalloc-192
> > >>>>>>>>
> > >>>>>>>> 32028 32028 100% 0.04K 314 102 1256K
> > >>>>> ext4_extent_status
> > >>>>>>>>
> > >>>>>>>> 30048 30048 100% 0.25K 939 32 7512K
> > >>>>> skbuff_head_cache
> > >>>>>>>>
> > >>>>>>>> 28736 28736 100% 0.06K 449 64 1796K
> fs_cache
> > >>>>>>>>
> > >>>>>>>> 24702 24702 100% 0.69K 537 46 17184K
> files_cache
> > >>>>>>>>
> > >>>>>>>> 23808 23808 100% 0.66K 496 48 15872K
> ovl_inode
> > >>>>>>>>
> > >>>>>>>> 23104 22945 99% 0.12K 722 32 2888K
> kmalloc-128
> > >>>>>>>>
> > >>>>>>>> 22724 21307 93% 0.69K 494 46 15808K
> > >>>>> shmem_inode_cache
> > >>>>>>>>
> > >>>>>>>> 21472 21472 100% 0.12K 671 32 2684K
> seq_file
> > >>>>>>>>
> > >>>>>>>> 19904 19904 100% 1.00K 622 32 19904K UNIX
> > >>>>>>>>
> > >>>>>>>> 17340 17340 100% 1.06K 578 30 18496K
> mm_struct
> > >>>>>>>>
> > >>>>>>>> 15980 15980 100% 0.02K 94 170 376K
> avtab_node
> > >>>>>>>>
> > >>>>>>>> 14070 14070 100% 1.06K 469 30 15008K
> > >> signal_cache
> > >>>>>>>>
> > >>>>>>>> 13248 13248 100% 0.12K 414 32 1656K pid
> > >>>>>>>>
> > >>>>>>>> 12128 11777 97% 0.25K 379 32 3032K
> kmalloc-256
> > >>>>>>>>
> > >>>>>>>> 11008 11008 100% 0.02K 43 256 172K
> > >>>>>>>> selinux_file_security
> > >>>>>>>> 10812 10812 100% 0.04K 106 102 424K
> > >> Acpi-Namespace
> > >>>>>>>>
> > >>>>>>>> These information shows that the 'iommu_iova' is the top memory
> > >>>>> consumer.
> > >>>>>>>> In order to optimize the network performence of Openstack
> virtual
> > >>>>> machines,
> > >>>>>>>> I enabled the vt-d feature in bios and sriov feature of Intel
> 82599
> > >>>>> 10G
> > >>>>>>>> NIC. I'm assuming this is the root cause of this issue.
> > >>>>>>>>
> > >>>>>>>> Is there anything I can do to fix it?
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> iommu mailing list
> > >>>>>> iommu@lists.linux-foundation.org
> > >>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> > _______________________________________________
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
[-- Attachment #1.2: Type: text/html, Size: 19281 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: iommu_iova slab eats too much memory
2020-04-28 9:17 ` Salil Mehta
2020-04-29 4:13 ` Bin
@ 2020-04-29 13:37 ` Salil Mehta
2020-04-29 13:59 ` Robin Murphy
1 sibling, 1 reply; 16+ messages in thread
From: Salil Mehta @ 2020-04-29 13:37 UTC (permalink / raw)
To: Salil Mehta, Robin Murphy, Bin; +Cc: iommu
Hi Bin,
> From: Bin [mailto:anole1949@gmail.com]
> Sent: Wednesday, April 29, 2020 5:14 AM
> To: Salil Mehta <salil.mehta@huawei.com>
> Hi Shlil:
>
> Thank you for your attention, and these are my answers:
>
> 1. I don't really understand what you're saying. What's the difference between DMA buffer and DMA mapping?
> It's like a memory block pool and a memory block or something like that?
DMA Mapping: Mapping are translations/associations [IOVA<->HPA OR IOVA<->GPA(further translated
to HPA by Stage-2)] which are created by the NIC driver. IOMMU hardware responsible for NIC
IOVA translations is populated with the mappings by the driver before submitting the DMA buffer
to the hardware for TX/RX.
DMA buffers: Actual Memory allocated by the driver where data could be DMA'ed (RX'ed or TX'ed)
I think you have missed the important point I mentioned earlier:
If there is a leak of IOVA mapping due to dma_unmap_* not being called somewhere then at
certain point the throughput will drastically fall and will almost become equal to zero.
This is due to the exhaustion of available IOVA mapping space in the IOMMU hardware.
Above condition is very much different than a *memory leak* of the DMA buffer itself which
will eventually lead to OOM.
Salil.
> FYI:
> I found an interesting phenomenon that it's just a small part of the running hosts has this issue, even though they all
> have the same kernel, configuration and hardwares, I don't know if this really mean something.
>
>
> Salil Mehta <salil.mehta@huawei.com> 于2020年4月28日周二 下午5:17写道:
> Hi Bin,
>
> Few questions:
>
> 1. If there is a leak of IOVA due to dma_unmap_* not being called somewhere then
> at certain point the throughput will drastically fall and will almost become
> equal
> to zero. This should be due to unavailability of the mapping anymore. But in
> your
> case VM is getting killed so this could be actual DMA buffer leak not DMA mapping
> leak. I doubt VM will get killed due to exhaustion of the DMA mappings in the
> IOMMU
> Layer for a transient reason or even due to mapping/unmapping leak.
>
> 2. Could you check if you have TSO offload enabled on Intel 82599? It will help
> in reducing the number of mappings and will take off IOVA mapping pressure from
> the IOMMU/VT-d? Though I am not sure it will help in reducing the amount of memory
> required for the buffers.
>
> 3. Also, have you checked the cpu-usage while your experiment is going on?
>
> Thanks
> Salil.
>
> > -----Original Message-----
> > From: iommu [mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of
> > Robin Murphy
> > Sent: Friday, April 24, 2020 5:31 PM
> > To: Bin <anole1949@gmail.com>
> > Cc: iommu@lists.linux-foundation.org
> > Subject: Re: iommu_iova slab eats too much memory
> >
> > On 2020-04-24 2:20 pm, Bin wrote:
> > > Dear Robin:
> > > Thank you for your explanation. Now, I understand that this could be
> > > NIC driver's fault, but how could I confirm it? Do I have to debug the
> > > driver myself?
> >
> > I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
> > memory about an order of magnitude faster than the IOVAs alone, but it
> > should shed some light on whether DMA API usage looks suspicious, and
> > dumping the mappings should help track down the responsible driver(s).
> > Although the debugfs code doesn't show the stacktrace of where each
> > mapping was made, I guess it would be fairly simple to tweak that for a
> > quick way to narrow down where to start looking in an offending driver.
> >
> > Robin.
> >
> > > Robin Murphy <robin.murphy@arm.com> 于2020年4月24日周五 下午8:15写道:
> > >
> > >> On 2020-04-24 1:06 pm, Bin wrote:
> > >>> I'm not familiar with the mmu stuff, so what you mean by "some driver
> > >>> leaking DMA mappings", is it possible that some other kernel module like
> > >>> KVM or NIC driver leads to the leaking problem instead of the iommu
> > >> module
> > >>> itself?
> > >>
> > >> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
> > >> should, since I'd expect a lot of people to have noticed that. It's far
> > >> more likely that some driver is failing to call dma_unmap_* when it's
> > >> finished with a buffer - with the IOMMU disabled that would be a no-op
> > >> on x86 with a modern 64-bit-capable device, so such a latent bug could
> > >> have been easily overlooked.
> > >>
> > >> Robin.
> > >>
> > >>> Bin <anole1949@gmail.com> 于 2020年4月24日周五 20:00写道:
> > >>>
> > >>>> Well, that's the problem! I'm assuming the iommu kernel module is
> > >> leaking
> > >>>> memory. But I don't know why and how.
> > >>>>
> > >>>> Do you have any idea about it? Or any further information is needed?
> > >>>>
> > >>>> Robin Murphy <robin.murphy@arm.com> 于 2020年4月24日周五 19:20写道:
> > >>>>
> > >>>>> On 2020-04-24 1:40 am, Bin wrote:
> > >>>>>> Hello? anyone there?
> > >>>>>>
> > >>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
> > >>>>>>
> > >>>>>>> Forget to mention, I've already disabled the slab merge, so this is
> > >>>>> what
> > >>>>>>> it is.
> > >>>>>>>
> > >>>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
> > >>>>>>>
> > >>>>>>>> Hey, guys:
> > >>>>>>>>
> > >>>>>>>> I'm running a batch of CoreOS boxes, the lsb_release is:
> > >>>>>>>>
> > >>>>>>>> ```
> > >>>>>>>> # cat /etc/lsb-release
> > >>>>>>>> DISTRIB_ID="Container Linux by CoreOS"
> > >>>>>>>> DISTRIB_RELEASE=2303.3.0
> > >>>>>>>> DISTRIB_CODENAME="Rhyolite"
> > >>>>>>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
> > >>>>>>>> ```
> > >>>>>>>>
> > >>>>>>>> ```
> > >>>>>>>> # uname -a
> > >>>>>>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00
> > >>>>> 2019
> > >>>>>>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
> > >>>>> GNU/Linux
> > >>>>>>>> ```
> > >>>>>>>> Recently, I found my vms constently being killed due to OOM, and
> > >> after
> > >>>>>>>> digging into the problem, I finally realized that the kernel is
> > >>>>> leaking
> > >>>>>>>> memory.
> > >>>>>>>>
> > >>>>>>>> Here's my slabinfo:
> > >>>>>>>>
> > >>>>>>>> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
> > >>>>>>>> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
> > >>>>>>>> Active / Total Caches (% used) : 152 / 217 (70.0%)
> > >>>>>>>> Active / Total Size (% used) : 5828768.08K / 5996848.72K
> > >>>>> (97.2%)
> > >>>>>>>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
> > >>>>>>>>
> > >>>>>>>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> > >>>>>>>>
> > >>>>>>>> 80253888 80253888 100% 0.06K 1253967 64 5015868K
> > >> iommu_iova
> > >>>>>
> > >>>>> Do you really have a peak demand of ~80 million simultaneous DMA
> > >>>>> buffers, or is some driver leaking DMA mappings?
> > >>>>>
> > >>>>> Robin.
> > >>>>>
> > >>>>>>>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
> > >>>>>>>>
> > >>>>>>>> 297444 271112 91% 0.19K 7082 42 56656K dentry
> > >>>>>>>>
> > >>>>>>>> 254400 252784 99% 0.06K 3975 64 15900K
> > >> anon_vma_chain
> > >>>>>>>>
> > >>>>>>>> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
> > >>>>>>>>
> > >>>>>>>> 202482 201814 99% 0.19K 4821 42 38568K
> > >> vm_area_struct
> > >>>>>>>>
> > >>>>>>>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
> > >>>>>>>>
> > >>>>>>>> 170528 169359 99% 0.25K 5329 32 42632K filp
> > >>>>>>>>
> > >>>>>>>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
> > >>>>>>>>
> > >>>>>>>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
> > >>>>>>>>
> > >>>>>>>> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
> > >>>>>>>>
> > >>>>>>>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
> > >>>>>>>>
> > >>>>>>>> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
> > >>>>>>>>
> > >>>>>>>> 105468 105106 99% 0.04K 1034 102 4136K
> > >>>>>>>> selinux_inode_security
> > >>>>>>>> 73080 73080 100% 0.13K 2436 30 9744K
> > >>>>> kernfs_node_cache
> > >>>>>>>>
> > >>>>>>>> 72360 70261 97% 0.59K 1340 54 42880K inode_cache
> > >>>>>>>>
> > >>>>>>>> 71040 71040 100% 0.12K 2220 32 8880K
> > >> eventpoll_epi
> > >>>>>>>>
> > >>>>>>>> 68096 59262 87% 0.02K 266 256 1064K kmalloc-16
> > >>>>>>>>
> > >>>>>>>> 53652 53652 100% 0.04K 526 102 2104K pde_opener
> > >>>>>>>>
> > >>>>>>>> 50496 31654 62% 2.00K 3156 16 100992K
> > >> kmalloc-2048
> > >>>>>>>>
> > >>>>>>>> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
> > >>>>>>>>
> > >>>>>>>> 44496 43013 96% 0.66K 927 48 29664K
> > >>>>> proc_inode_cache
> > >>>>>>>>
> > >>>>>>>> 44352 44352 100% 0.06K 693 64 2772K
> > >>>>> task_delay_info
> > >>>>>>>>
> > >>>>>>>> 43516 43471 99% 0.69K 946 46 30272K
> > >>>>> sock_inode_cache
> > >>>>>>>>
> > >>>>>>>> 37856 27626 72% 1.00K 1183 32 37856K
> > >> kmalloc-1024
> > >>>>>>>>
> > >>>>>>>> 36736 36736 100% 0.07K 656 56 2624K
> > >> eventpoll_pwq
> > >>>>>>>>
> > >>>>>>>> 34076 31282 91% 0.57K 1217 28 19472K
> > >>>>> radix_tree_node
> > >>>>>>>>
> > >>>>>>>> 33660 30528 90% 1.05K 1122 30 35904K
> > >>>>> ext4_inode_cache
> > >>>>>>>>
> > >>>>>>>> 32760 30959 94% 0.19K 780 42 6240K kmalloc-192
> > >>>>>>>>
> > >>>>>>>> 32028 32028 100% 0.04K 314 102 1256K
> > >>>>> ext4_extent_status
> > >>>>>>>>
> > >>>>>>>> 30048 30048 100% 0.25K 939 32 7512K
> > >>>>> skbuff_head_cache
> > >>>>>>>>
> > >>>>>>>> 28736 28736 100% 0.06K 449 64 1796K fs_cache
> > >>>>>>>>
> > >>>>>>>> 24702 24702 100% 0.69K 537 46 17184K files_cache
> > >>>>>>>>
> > >>>>>>>> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
> > >>>>>>>>
> > >>>>>>>> 23104 22945 99% 0.12K 722 32 2888K kmalloc-128
> > >>>>>>>>
> > >>>>>>>> 22724 21307 93% 0.69K 494 46 15808K
> > >>>>> shmem_inode_cache
> > >>>>>>>>
> > >>>>>>>> 21472 21472 100% 0.12K 671 32 2684K seq_file
> > >>>>>>>>
> > >>>>>>>> 19904 19904 100% 1.00K 622 32 19904K UNIX
> > >>>>>>>>
> > >>>>>>>> 17340 17340 100% 1.06K 578 30 18496K mm_struct
> > >>>>>>>>
> > >>>>>>>> 15980 15980 100% 0.02K 94 170 376K avtab_node
> > >>>>>>>>
> > >>>>>>>> 14070 14070 100% 1.06K 469 30 15008K
> > >> signal_cache
> > >>>>>>>>
> > >>>>>>>> 13248 13248 100% 0.12K 414 32 1656K pid
> > >>>>>>>>
> > >>>>>>>> 12128 11777 97% 0.25K 379 32 3032K kmalloc-256
> > >>>>>>>>
> > >>>>>>>> 11008 11008 100% 0.02K 43 256 172K
> > >>>>>>>> selinux_file_security
> > >>>>>>>> 10812 10812 100% 0.04K 106 102 424K
> > >> Acpi-Namespace
> > >>>>>>>>
> > >>>>>>>> These information shows that the 'iommu_iova' is the top memory
> > >>>>> consumer.
> > >>>>>>>> In order to optimize the network performence of Openstack virtual
> > >>>>> machines,
> > >>>>>>>> I enabled the vt-d feature in bios and sriov feature of Intel 82599
> > >>>>> 10G
> > >>>>>>>> NIC. I'm assuming this is the root cause of this issue.
> > >>>>>>>>
> > >>>>>>>> Is there anything I can do to fix it?
> > >>>>>>>>
> > >>>>>>>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: iommu_iova slab eats too much memory
2020-04-29 13:37 ` Salil Mehta
@ 2020-04-29 13:59 ` Robin Murphy
2020-04-29 15:00 ` Salil Mehta
0 siblings, 1 reply; 16+ messages in thread
From: Robin Murphy @ 2020-04-29 13:59 UTC (permalink / raw)
To: Salil Mehta, Bin; +Cc: iommu
On 2020-04-29 2:37 pm, Salil Mehta wrote:
> Hi Bin,
>
>> From: Bin [mailto:anole1949@gmail.com]
>> Sent: Wednesday, April 29, 2020 5:14 AM
>> To: Salil Mehta <salil.mehta@huawei.com>
>> Hi Shlil:
>>
>> Thank you for your attention, and these are my answers:
>>
>> 1. I don't really understand what you're saying. What's the difference between DMA buffer and DMA mapping?
>> It's like a memory block pool and a memory block or something like that?
>
>
> DMA Mapping: Mapping are translations/associations [IOVA<->HPA OR IOVA<->GPA(further translated
> to HPA by Stage-2)] which are created by the NIC driver. IOMMU hardware responsible for NIC
> IOVA translations is populated with the mappings by the driver before submitting the DMA buffer
> to the hardware for TX/RX.
>
> DMA buffers: Actual Memory allocated by the driver where data could be DMA'ed (RX'ed or TX'ed)
>
>
> I think you have missed the important point I mentioned earlier:
> If there is a leak of IOVA mapping due to dma_unmap_* not being called somewhere then at
> certain point the throughput will drastically fall and will almost become equal to zero.
> This is due to the exhaustion of available IOVA mapping space in the IOMMU hardware.
With 64-bit address spaces, you're still likely to run out of memory for
the IOVA structures and pagetables before you run out of the actual
address space that they represent. The slowdown comes from having to
walk the whole the rbtree to search for free space or free a PFN, but
depending on how the allocation pattern interacts with the caching
mechanism that may never happen to a significant degree.
> Above condition is very much different than a *memory leak* of the DMA buffer itself which
> will eventually lead to OOM.
>
>
> Salil.
>
>> FYI:
>> I found an interesting phenomenon that it's just a small part of the running hosts has this issue, even though they all
>> have the same kernel, configuration and hardwares, I don't know if this really mean something.
Another thought for a debugging sanity check is to look at the
intel-iommu tracepoints on a misbehaving system and see whether maps vs.
unmaps look significantly out of balance. You could probably do
something clever with ftrace to look for that kind of pattern in teh DMA
API calls, too.
Robin.
>>
>>
>> Salil Mehta <salil.mehta@huawei.com> 于2020年4月28日周二 下午5:17写道:
>> Hi Bin,
>>
>> Few questions:
>>
>> 1. If there is a leak of IOVA due to dma_unmap_* not being called somewhere then
>> at certain point the throughput will drastically fall and will almost become
>> equal
>> to zero. This should be due to unavailability of the mapping anymore. But in
>> your
>> case VM is getting killed so this could be actual DMA buffer leak not DMA mapping
>> leak. I doubt VM will get killed due to exhaustion of the DMA mappings in the
>> IOMMU
>> Layer for a transient reason or even due to mapping/unmapping leak.
>>
>> 2. Could you check if you have TSO offload enabled on Intel 82599? It will help
>> in reducing the number of mappings and will take off IOVA mapping pressure from
>> the IOMMU/VT-d? Though I am not sure it will help in reducing the amount of memory
>> required for the buffers.
>>
>> 3. Also, have you checked the cpu-usage while your experiment is going on?
>>
>> Thanks
>> Salil.
>>
>>> -----Original Message-----
>>> From: iommu [mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of
>>> Robin Murphy
>>> Sent: Friday, April 24, 2020 5:31 PM
>>> To: Bin <anole1949@gmail.com>
>>> Cc: iommu@lists.linux-foundation.org
>>> Subject: Re: iommu_iova slab eats too much memory
>>>
>>> On 2020-04-24 2:20 pm, Bin wrote:
>>>> Dear Robin:
>>>> Thank you for your explanation. Now, I understand that this could be
>>>> NIC driver's fault, but how could I confirm it? Do I have to debug the
>>>> driver myself?
>>>
>>> I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
>>> memory about an order of magnitude faster than the IOVAs alone, but it
>>> should shed some light on whether DMA API usage looks suspicious, and
>>> dumping the mappings should help track down the responsible driver(s).
>>> Although the debugfs code doesn't show the stacktrace of where each
>>> mapping was made, I guess it would be fairly simple to tweak that for a
>>> quick way to narrow down where to start looking in an offending driver.
>>>
>>> Robin.
>>>
>>>> Robin Murphy <robin.murphy@arm.com> 于2020年4月24日周五 下午8:15写道:
>>>>
>>>>> On 2020-04-24 1:06 pm, Bin wrote:
>>>>>> I'm not familiar with the mmu stuff, so what you mean by "some driver
>>>>>> leaking DMA mappings", is it possible that some other kernel module like
>>>>>> KVM or NIC driver leads to the leaking problem instead of the iommu
>>>>> module
>>>>>> itself?
>>>>>
>>>>> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
>>>>> should, since I'd expect a lot of people to have noticed that. It's far
>>>>> more likely that some driver is failing to call dma_unmap_* when it's
>>>>> finished with a buffer - with the IOMMU disabled that would be a no-op
>>>>> on x86 with a modern 64-bit-capable device, so such a latent bug could
>>>>> have been easily overlooked.
>>>>>
>>>>> Robin.
>>>>>
>>>>>> Bin <anole1949@gmail.com> 于 2020年4月24日周五 20:00写道:
>>>>>>
>>>>>>> Well, that's the problem! I'm assuming the iommu kernel module is
>>>>> leaking
>>>>>>> memory. But I don't know why and how.
>>>>>>>
>>>>>>> Do you have any idea about it? Or any further information is needed?
>>>>>>>
>>>>>>> Robin Murphy <robin.murphy@arm.com> 于 2020年4月24日周五 19:20写道:
>>>>>>>
>>>>>>>> On 2020-04-24 1:40 am, Bin wrote:
>>>>>>>>> Hello? anyone there?
>>>>>>>>>
>>>>>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
>>>>>>>>>
>>>>>>>>>> Forget to mention, I've already disabled the slab merge, so this is
>>>>>>>> what
>>>>>>>>>> it is.
>>>>>>>>>>
>>>>>>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
>>>>>>>>>>
>>>>>>>>>>> Hey, guys:
>>>>>>>>>>>
>>>>>>>>>>> I'm running a batch of CoreOS boxes, the lsb_release is:
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> # cat /etc/lsb-release
>>>>>>>>>>> DISTRIB_ID="Container Linux by CoreOS"
>>>>>>>>>>> DISTRIB_RELEASE=2303.3.0
>>>>>>>>>>> DISTRIB_CODENAME="Rhyolite"
>>>>>>>>>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> # uname -a
>>>>>>>>>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00
>>>>>>>> 2019
>>>>>>>>>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
>>>>>>>> GNU/Linux
>>>>>>>>>>> ```
>>>>>>>>>>> Recently, I found my vms constently being killed due to OOM, and
>>>>> after
>>>>>>>>>>> digging into the problem, I finally realized that the kernel is
>>>>>>>> leaking
>>>>>>>>>>> memory.
>>>>>>>>>>>
>>>>>>>>>>> Here's my slabinfo:
>>>>>>>>>>>
>>>>>>>>>>> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
>>>>>>>>>>> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
>>>>>>>>>>> Active / Total Caches (% used) : 152 / 217 (70.0%)
>>>>>>>>>>> Active / Total Size (% used) : 5828768.08K / 5996848.72K
>>>>>>>> (97.2%)
>>>>>>>>>>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
>>>>>>>>>>>
>>>>>>>>>>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
>>>>>>>>>>>
>>>>>>>>>>> 80253888 80253888 100% 0.06K 1253967 64 5015868K
>>>>> iommu_iova
>>>>>>>>
>>>>>>>> Do you really have a peak demand of ~80 million simultaneous DMA
>>>>>>>> buffers, or is some driver leaking DMA mappings?
>>>>>>>>
>>>>>>>> Robin.
>>>>>>>>
>>>>>>>>>>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
>>>>>>>>>>>
>>>>>>>>>>> 297444 271112 91% 0.19K 7082 42 56656K dentry
>>>>>>>>>>>
>>>>>>>>>>> 254400 252784 99% 0.06K 3975 64 15900K
>>>>> anon_vma_chain
>>>>>>>>>>>
>>>>>>>>>>> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
>>>>>>>>>>>
>>>>>>>>>>> 202482 201814 99% 0.19K 4821 42 38568K
>>>>> vm_area_struct
>>>>>>>>>>>
>>>>>>>>>>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
>>>>>>>>>>>
>>>>>>>>>>> 170528 169359 99% 0.25K 5329 32 42632K filp
>>>>>>>>>>>
>>>>>>>>>>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
>>>>>>>>>>>
>>>>>>>>>>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
>>>>>>>>>>>
>>>>>>>>>>> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
>>>>>>>>>>>
>>>>>>>>>>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
>>>>>>>>>>>
>>>>>>>>>>> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
>>>>>>>>>>>
>>>>>>>>>>> 105468 105106 99% 0.04K 1034 102 4136K
>>>>>>>>>>> selinux_inode_security
>>>>>>>>>>> 73080 73080 100% 0.13K 2436 30 9744K
>>>>>>>> kernfs_node_cache
>>>>>>>>>>>
>>>>>>>>>>> 72360 70261 97% 0.59K 1340 54 42880K inode_cache
>>>>>>>>>>>
>>>>>>>>>>> 71040 71040 100% 0.12K 2220 32 8880K
>>>>> eventpoll_epi
>>>>>>>>>>>
>>>>>>>>>>> 68096 59262 87% 0.02K 266 256 1064K kmalloc-16
>>>>>>>>>>>
>>>>>>>>>>> 53652 53652 100% 0.04K 526 102 2104K pde_opener
>>>>>>>>>>>
>>>>>>>>>>> 50496 31654 62% 2.00K 3156 16 100992K
>>>>> kmalloc-2048
>>>>>>>>>>>
>>>>>>>>>>> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
>>>>>>>>>>>
>>>>>>>>>>> 44496 43013 96% 0.66K 927 48 29664K
>>>>>>>> proc_inode_cache
>>>>>>>>>>>
>>>>>>>>>>> 44352 44352 100% 0.06K 693 64 2772K
>>>>>>>> task_delay_info
>>>>>>>>>>>
>>>>>>>>>>> 43516 43471 99% 0.69K 946 46 30272K
>>>>>>>> sock_inode_cache
>>>>>>>>>>>
>>>>>>>>>>> 37856 27626 72% 1.00K 1183 32 37856K
>>>>> kmalloc-1024
>>>>>>>>>>>
>>>>>>>>>>> 36736 36736 100% 0.07K 656 56 2624K
>>>>> eventpoll_pwq
>>>>>>>>>>>
>>>>>>>>>>> 34076 31282 91% 0.57K 1217 28 19472K
>>>>>>>> radix_tree_node
>>>>>>>>>>>
>>>>>>>>>>> 33660 30528 90% 1.05K 1122 30 35904K
>>>>>>>> ext4_inode_cache
>>>>>>>>>>>
>>>>>>>>>>> 32760 30959 94% 0.19K 780 42 6240K kmalloc-192
>>>>>>>>>>>
>>>>>>>>>>> 32028 32028 100% 0.04K 314 102 1256K
>>>>>>>> ext4_extent_status
>>>>>>>>>>>
>>>>>>>>>>> 30048 30048 100% 0.25K 939 32 7512K
>>>>>>>> skbuff_head_cache
>>>>>>>>>>>
>>>>>>>>>>> 28736 28736 100% 0.06K 449 64 1796K fs_cache
>>>>>>>>>>>
>>>>>>>>>>> 24702 24702 100% 0.69K 537 46 17184K files_cache
>>>>>>>>>>>
>>>>>>>>>>> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
>>>>>>>>>>>
>>>>>>>>>>> 23104 22945 99% 0.12K 722 32 2888K kmalloc-128
>>>>>>>>>>>
>>>>>>>>>>> 22724 21307 93% 0.69K 494 46 15808K
>>>>>>>> shmem_inode_cache
>>>>>>>>>>>
>>>>>>>>>>> 21472 21472 100% 0.12K 671 32 2684K seq_file
>>>>>>>>>>>
>>>>>>>>>>> 19904 19904 100% 1.00K 622 32 19904K UNIX
>>>>>>>>>>>
>>>>>>>>>>> 17340 17340 100% 1.06K 578 30 18496K mm_struct
>>>>>>>>>>>
>>>>>>>>>>> 15980 15980 100% 0.02K 94 170 376K avtab_node
>>>>>>>>>>>
>>>>>>>>>>> 14070 14070 100% 1.06K 469 30 15008K
>>>>> signal_cache
>>>>>>>>>>>
>>>>>>>>>>> 13248 13248 100% 0.12K 414 32 1656K pid
>>>>>>>>>>>
>>>>>>>>>>> 12128 11777 97% 0.25K 379 32 3032K kmalloc-256
>>>>>>>>>>>
>>>>>>>>>>> 11008 11008 100% 0.02K 43 256 172K
>>>>>>>>>>> selinux_file_security
>>>>>>>>>>> 10812 10812 100% 0.04K 106 102 424K
>>>>> Acpi-Namespace
>>>>>>>>>>>
>>>>>>>>>>> These information shows that the 'iommu_iova' is the top memory
>>>>>>>> consumer.
>>>>>>>>>>> In order to optimize the network performence of Openstack virtual
>>>>>>>> machines,
>>>>>>>>>>> I enabled the vt-d feature in bios and sriov feature of Intel 82599
>>>>>>>> 10G
>>>>>>>>>>> NIC. I'm assuming this is the root cause of this issue.
>>>>>>>>>>>
>>>>>>>>>>> Is there anything I can do to fix it?
>>>>>>>>>>>
>>>>>>>>>>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: iommu_iova slab eats too much memory
2020-04-29 13:59 ` Robin Murphy
@ 2020-04-29 15:00 ` Salil Mehta
0 siblings, 0 replies; 16+ messages in thread
From: Salil Mehta @ 2020-04-29 15:00 UTC (permalink / raw)
To: Robin Murphy, Bin; +Cc: iommu
> From: Robin Murphy [mailto:robin.murphy@arm.com]
> Sent: Wednesday, April 29, 2020 3:00 PM
> To: Salil Mehta <salil.mehta@huawei.com>; Bin <anole1949@gmail.com>
>
> On 2020-04-29 2:37 pm, Salil Mehta wrote:
> > Hi Bin,
> >
> >> From: Bin [mailto:anole1949@gmail.com]
> >> Sent: Wednesday, April 29, 2020 5:14 AM
> >> To: Salil Mehta <salil.mehta@huawei.com>
> >> Hi Shlil:
> >>
> >> Thank you for your attention, and these are my answers:
> >>
> >> 1. I don't really understand what you're saying. What's the difference between
> DMA buffer and DMA mapping?
> >> It's like a memory block pool and a memory block or something like that?
> >
> >
> > DMA Mapping: Mapping are translations/associations [IOVA<->HPA OR IOVA<->GPA(further translated
> > to HPA by Stage-2)] which are created by the NIC driver. IOMMU hardware responsible for NIC
> > IOVA translations is populated with the mappings by the driver before submitting the DMA buffer
> > to the hardware for TX/RX.
> >
> > DMA buffers: Actual Memory allocated by the driver where data could be DMA'ed (RX'ed or TX'ed)
> >
> >
> > I think you have missed the important point I mentioned earlier:
> > If there is a leak of IOVA mapping due to dma_unmap_* not being called somewhere then at
> > certain point the throughput will drastically fall and will almost become equal to zero.
> > This is due to the exhaustion of available IOVA mapping space in the IOMMU hardware.
>
> With 64-bit address spaces, you're still likely to run out of memory for
> the IOVA structures and pagetables before you run out of the actual
> address space that they represent.
I see. Good point and it was non-obvious.
> The slowdown comes from having to
> walk the whole the rbtree to search for free space or free a PFN, but
> depending on how the allocation pattern interacts with the caching
> mechanism that may never happen to a significant degree.
So assuming, due to above limitation of the algorithm allocation of
such free mapping space gets delayed, this should only help in more
availability of the system memory in general unless this also affects
the release of the mappings - perhaps I am missing something here?
> > Above condition is very much different than a *memory leak* of the DMA buffer
> itself which
> > will eventually lead to OOM.
> >
> >
> > Salil.
> >
> >> FYI:
> >> I found an interesting phenomenon that it's just a small part of the running
> hosts has this issue, even though they all
> >> have the same kernel, configuration and hardwares, I don't know if this really
> mean something.
>
> Another thought for a debugging sanity check is to look at the
> intel-iommu tracepoints on a misbehaving system and see whether maps vs.
> unmaps look significantly out of balance. You could probably do
> something clever with ftrace to look for that kind of pattern in teh DMA
> API calls, too.
>
> Robin.
>
> >>
> >>
> >> Salil Mehta <salil.mehta@huawei.com> 于2020年4月28日周二 下午5:17写道:
> >> Hi Bin,
> >>
> >> Few questions:
> >>
> >> 1. If there is a leak of IOVA due to dma_unmap_* not being called somewhere
> then
> >> at certain point the throughput will drastically fall and will almost become
> >> equal
> >> to zero. This should be due to unavailability of the mapping anymore. But
> in
> >> your
> >> case VM is getting killed so this could be actual DMA buffer leak not DMA
> mapping
> >> leak. I doubt VM will get killed due to exhaustion of the DMA mappings in
> the
> >> IOMMU
> >> Layer for a transient reason or even due to mapping/unmapping leak.
> >>
> >> 2. Could you check if you have TSO offload enabled on Intel 82599? It will
> help
> >> in reducing the number of mappings and will take off IOVA mapping pressure
> from
> >> the IOMMU/VT-d? Though I am not sure it will help in reducing the amount of
> memory
> >> required for the buffers.
> >>
> >> 3. Also, have you checked the cpu-usage while your experiment is going on?
> >>
> >> Thanks
> >> Salil.
> >>
> >>> -----Original Message-----
> >>> From: iommu [mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of
> >>> Robin Murphy
> >>> Sent: Friday, April 24, 2020 5:31 PM
> >>> To: Bin <anole1949@gmail.com>
> >>> Cc: iommu@lists.linux-foundation.org
> >>> Subject: Re: iommu_iova slab eats too much memory
> >>>
> >>> On 2020-04-24 2:20 pm, Bin wrote:
> >>>> Dear Robin:
> >>>> Thank you for your explanation. Now, I understand that this could be
> >>>> NIC driver's fault, but how could I confirm it? Do I have to debug the
> >>>> driver myself?
> >>>
> >>> I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
> >>> memory about an order of magnitude faster than the IOVAs alone, but it
> >>> should shed some light on whether DMA API usage looks suspicious, and
> >>> dumping the mappings should help track down the responsible driver(s).
> >>> Although the debugfs code doesn't show the stacktrace of where each
> >>> mapping was made, I guess it would be fairly simple to tweak that for a
> >>> quick way to narrow down where to start looking in an offending driver.
> >>>
> >>> Robin.
> >>>
> >>>> Robin Murphy <robin.murphy@arm.com> 于2020年4月24日周五 下午8:15写道:
> >>>>
> >>>>> On 2020-04-24 1:06 pm, Bin wrote:
> >>>>>> I'm not familiar with the mmu stuff, so what you mean by "some driver
> >>>>>> leaking DMA mappings", is it possible that some other kernel module like
> >>>>>> KVM or NIC driver leads to the leaking problem instead of the iommu
> >>>>> module
> >>>>>> itself?
> >>>>>
> >>>>> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
> >>>>> should, since I'd expect a lot of people to have noticed that. It's far
> >>>>> more likely that some driver is failing to call dma_unmap_* when it's
> >>>>> finished with a buffer - with the IOMMU disabled that would be a no-op
> >>>>> on x86 with a modern 64-bit-capable device, so such a latent bug could
> >>>>> have been easily overlooked.
> >>>>>
> >>>>> Robin.
> >>>>>
> >>>>>> Bin <anole1949@gmail.com> 于 2020年4月24日周五 20:00写道:
> >>>>>>
> >>>>>>> Well, that's the problem! I'm assuming the iommu kernel module is
> >>>>> leaking
> >>>>>>> memory. But I don't know why and how.
> >>>>>>>
> >>>>>>> Do you have any idea about it? Or any further information is needed?
> >>>>>>>
> >>>>>>> Robin Murphy <robin.murphy@arm.com> 于 2020年4月24日周五 19:20写道:
> >>>>>>>
> >>>>>>>> On 2020-04-24 1:40 am, Bin wrote:
> >>>>>>>>> Hello? anyone there?
> >>>>>>>>>
> >>>>>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:14写道:
> >>>>>>>>>
> >>>>>>>>>> Forget to mention, I've already disabled the slab merge, so this is
> >>>>>>>> what
> >>>>>>>>>> it is.
> >>>>>>>>>>
> >>>>>>>>>> Bin <anole1949@gmail.com> 于2020年4月23日周四 下午5:11写道:
> >>>>>>>>>>
> >>>>>>>>>>> Hey, guys:
> >>>>>>>>>>>
> >>>>>>>>>>> I'm running a batch of CoreOS boxes, the lsb_release is:
> >>>>>>>>>>>
> >>>>>>>>>>> ```
> >>>>>>>>>>> # cat /etc/lsb-release
> >>>>>>>>>>> DISTRIB_ID="Container Linux by CoreOS"
> >>>>>>>>>>> DISTRIB_RELEASE=2303.3.0
> >>>>>>>>>>> DISTRIB_CODENAME="Rhyolite"
> >>>>>>>>>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0
> (Rhyolite)"
> >>>>>>>>>>> ```
> >>>>>>>>>>>
> >>>>>>>>>>> ```
> >>>>>>>>>>> # uname -a
> >>>>>>>>>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00
> >>>>>>>> 2019
> >>>>>>>>>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
> >>>>>>>> GNU/Linux
> >>>>>>>>>>> ```
> >>>>>>>>>>> Recently, I found my vms constently being killed due to OOM, and
> >>>>> after
> >>>>>>>>>>> digging into the problem, I finally realized that the kernel is
> >>>>>>>> leaking
> >>>>>>>>>>> memory.
> >>>>>>>>>>>
> >>>>>>>>>>> Here's my slabinfo:
> >>>>>>>>>>>
> >>>>>>>>>>> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
> >>>>>>>>>>> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
> >>>>>>>>>>> Active / Total Caches (% used) : 152 / 217 (70.0%)
> >>>>>>>>>>> Active / Total Size (% used) : 5828768.08K / 5996848.72K
> >>>>>>>> (97.2%)
> >>>>>>>>>>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
> >>>>>>>>>>>
> >>>>>>>>>>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> >>>>>>>>>>>
> >>>>>>>>>>> 80253888 80253888 100% 0.06K 1253967 64 5015868K
> >>>>> iommu_iova
> >>>>>>>>
> >>>>>>>> Do you really have a peak demand of ~80 million simultaneous DMA
> >>>>>>>> buffers, or is some driver leaking DMA mappings?
> >>>>>>>>
> >>>>>>>> Robin.
> >>>>>>>>
> >>>>>>>>>>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
> >>>>>>>>>>>
> >>>>>>>>>>> 297444 271112 91% 0.19K 7082 42 56656K dentry
> >>>>>>>>>>>
> >>>>>>>>>>> 254400 252784 99% 0.06K 3975 64 15900K
> >>>>> anon_vma_chain
> >>>>>>>>>>>
> >>>>>>>>>>> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
> >>>>>>>>>>>
> >>>>>>>>>>> 202482 201814 99% 0.19K 4821 42 38568K
> >>>>> vm_area_struct
> >>>>>>>>>>>
> >>>>>>>>>>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
> >>>>>>>>>>>
> >>>>>>>>>>> 170528 169359 99% 0.25K 5329 32 42632K filp
> >>>>>>>>>>>
> >>>>>>>>>>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
> >>>>>>>>>>>
> >>>>>>>>>>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
> >>>>>>>>>>>
> >>>>>>>>>>> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
> >>>>>>>>>>>
> >>>>>>>>>>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
> >>>>>>>>>>>
> >>>>>>>>>>> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
> >>>>>>>>>>>
> >>>>>>>>>>> 105468 105106 99% 0.04K 1034 102 4136K
> >>>>>>>>>>> selinux_inode_security
> >>>>>>>>>>> 73080 73080 100% 0.13K 2436 30 9744K
> >>>>>>>> kernfs_node_cache
> >>>>>>>>>>>
> >>>>>>>>>>> 72360 70261 97% 0.59K 1340 54 42880K
> inode_cache
> >>>>>>>>>>>
> >>>>>>>>>>> 71040 71040 100% 0.12K 2220 32 8880K
> >>>>> eventpoll_epi
> >>>>>>>>>>>
> >>>>>>>>>>> 68096 59262 87% 0.02K 266 256 1064K
> kmalloc-16
> >>>>>>>>>>>
> >>>>>>>>>>> 53652 53652 100% 0.04K 526 102 2104K
> pde_opener
> >>>>>>>>>>>
> >>>>>>>>>>> 50496 31654 62% 2.00K 3156 16 100992K
> >>>>> kmalloc-2048
> >>>>>>>>>>>
> >>>>>>>>>>> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
> >>>>>>>>>>>
> >>>>>>>>>>> 44496 43013 96% 0.66K 927 48 29664K
> >>>>>>>> proc_inode_cache
> >>>>>>>>>>>
> >>>>>>>>>>> 44352 44352 100% 0.06K 693 64 2772K
> >>>>>>>> task_delay_info
> >>>>>>>>>>>
> >>>>>>>>>>> 43516 43471 99% 0.69K 946 46 30272K
> >>>>>>>> sock_inode_cache
> >>>>>>>>>>>
> >>>>>>>>>>> 37856 27626 72% 1.00K 1183 32 37856K
> >>>>> kmalloc-1024
> >>>>>>>>>>>
> >>>>>>>>>>> 36736 36736 100% 0.07K 656 56 2624K
> >>>>> eventpoll_pwq
> >>>>>>>>>>>
> >>>>>>>>>>> 34076 31282 91% 0.57K 1217 28 19472K
> >>>>>>>> radix_tree_node
> >>>>>>>>>>>
> >>>>>>>>>>> 33660 30528 90% 1.05K 1122 30 35904K
> >>>>>>>> ext4_inode_cache
> >>>>>>>>>>>
> >>>>>>>>>>> 32760 30959 94% 0.19K 780 42 6240K
> kmalloc-192
> >>>>>>>>>>>
> >>>>>>>>>>> 32028 32028 100% 0.04K 314 102 1256K
> >>>>>>>> ext4_extent_status
> >>>>>>>>>>>
> >>>>>>>>>>> 30048 30048 100% 0.25K 939 32 7512K
> >>>>>>>> skbuff_head_cache
> >>>>>>>>>>>
> >>>>>>>>>>> 28736 28736 100% 0.06K 449 64 1796K fs_cache
> >>>>>>>>>>>
> >>>>>>>>>>> 24702 24702 100% 0.69K 537 46 17184K
> files_cache
> >>>>>>>>>>>
> >>>>>>>>>>> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
> >>>>>>>>>>>
> >>>>>>>>>>> 23104 22945 99% 0.12K 722 32 2888K
> kmalloc-128
> >>>>>>>>>>>
> >>>>>>>>>>> 22724 21307 93% 0.69K 494 46 15808K
> >>>>>>>> shmem_inode_cache
> >>>>>>>>>>>
> >>>>>>>>>>> 21472 21472 100% 0.12K 671 32 2684K seq_file
> >>>>>>>>>>>
> >>>>>>>>>>> 19904 19904 100% 1.00K 622 32 19904K UNIX
> >>>>>>>>>>>
> >>>>>>>>>>> 17340 17340 100% 1.06K 578 30 18496K mm_struct
> >>>>>>>>>>>
> >>>>>>>>>>> 15980 15980 100% 0.02K 94 170 376K avtab_node
> >>>>>>>>>>>
> >>>>>>>>>>> 14070 14070 100% 1.06K 469 30 15008K
> >>>>> signal_cache
> >>>>>>>>>>>
> >>>>>>>>>>> 13248 13248 100% 0.12K 414 32 1656K pid
> >>>>>>>>>>>
> >>>>>>>>>>> 12128 11777 97% 0.25K 379 32 3032K
> kmalloc-256
> >>>>>>>>>>>
> >>>>>>>>>>> 11008 11008 100% 0.02K 43 256 172K
> >>>>>>>>>>> selinux_file_security
> >>>>>>>>>>> 10812 10812 100% 0.04K 106 102 424K
> >>>>> Acpi-Namespace
> >>>>>>>>>>>
> >>>>>>>>>>> These information shows that the 'iommu_iova' is the top memory
> >>>>>>>> consumer.
> >>>>>>>>>>> In order to optimize the network performence of Openstack virtual
> >>>>>>>> machines,
> >>>>>>>>>>> I enabled the vt-d feature in bios and sriov feature of Intel 82599
> >>>>>>>> 10G
> >>>>>>>>>>> NIC. I'm assuming this is the root cause of this issue.
> >>>>>>>>>>>
> >>>>>>>>>>> Is there anything I can do to fix it?
> >>>>>>>>>>>
> >>>>>>>>>>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2020-04-29 15:00 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-23 9:11 iommu_iova slab eats too much memory Bin
2020-04-23 9:14 ` Bin
2020-04-24 0:40 ` Bin
2020-04-24 11:20 ` Robin Murphy
2020-04-24 12:00 ` Bin
2020-04-24 12:06 ` Bin
2020-04-24 12:15 ` Robin Murphy
2020-04-24 13:20 ` Bin
2020-04-24 16:30 ` Robin Murphy
2020-04-24 17:49 ` John Garry
2020-04-25 13:38 ` Bin
2020-04-28 9:17 ` Salil Mehta
2020-04-29 4:13 ` Bin
2020-04-29 13:37 ` Salil Mehta
2020-04-29 13:59 ` Robin Murphy
2020-04-29 15:00 ` Salil Mehta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).