All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konstantin Khlebnikov <koct9i@gmail.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>,
	Michel Lespinasse <walken@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Tim Hartrick <tim@edgecast.com>
Subject: Re: [PATCH] Repeated fork() causes SLAB to grow without bound
Date: Tue, 25 Nov 2014 16:13:16 +0400	[thread overview]
Message-ID: <CALYGNiPZmf4Y1_vX_FaiALKp-BPvct7fAiaPEjnDGnVx9paS9w@mail.gmail.com> (raw)
In-Reply-To: <20141125105953.GC4607@dhcp22.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 16416 bytes --]

On Tue, Nov 25, 2014 at 1:59 PM, Michal Hocko <mhocko@suse.cz> wrote:
> On Mon 24-11-14 11:09:40, Konstantin Khlebnikov wrote:
>> On Thu, Nov 20, 2014 at 6:03 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>> > On Thu, Nov 20, 2014 at 5:50 PM, Rik van Riel <riel@redhat.com> wrote:
>> >> -----BEGIN PGP SIGNED MESSAGE-----
>> >> Hash: SHA1
>> >>
>> >> On 11/20/2014 09:42 AM, Konstantin Khlebnikov wrote:
>> >>
>> >>> I'm thinking about limitation for reusing anon_vmas which might
>> >>> increase performance without breaking asymptotic estimation of
>> >>> count anon_vma in the worst case. For example this heuristic: allow
>> >>> to reuse only anon_vma with single direct descendant. It seems
>> >>> there will be arount up to two times more anon_vmas but
>> >>> false-aliasing must be much lower.
>>
>> Done. RFC patch in attachment.
>
> This is triggering BUG_ON(anon_vma->degree); in unlink_anon_vmas. I have
> applied the patch on top of 3.18.0-rc6.

It seems I've screwed up with counter if anon_vma is merged in anon_vma_prepare.
Increment must be in the next if block:

--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -182,8 +182,6 @@ int anon_vma_prepare(struct vm_area_struct *vma)
                        if (unlikely(!anon_vma))
                                goto out_enomem_free_avc;
                        allocated = anon_vma;
-                       /* Bump degree, root anon_vma is its own parent. */
-                       anon_vma->degree++;
                }

                anon_vma_lock_write(anon_vma);
@@ -192,6 +190,7 @@ int anon_vma_prepare(struct vm_area_struct *vma)
                if (likely(!vma->anon_vma)) {
                        vma->anon_vma = anon_vma;
                        anon_vma_chain_link(vma, avc, anon_vma);
+                       anon_vma->degree++;
                        allocated = NULL;
                        avc = NULL;
                }

I've tested it with trinity but probably isn't long enough.

>
> [   12.380189] ------------[ cut here ]------------
> [   12.380221] kernel BUG at mm/rmap.c:385!
> [   12.380239] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [   12.380272] Modules linked in: i915 cfbfillrect cfbimgblt i2c_algo_bit fbcon bitblit softcursor cfbcopyarea font drm_kms_helper drm fb fbdev binfmt_misc fuse uvcvideo videobuf2_vmalloc videobuf2_memops arc4 videobuf2_core v4l2_common sdhci_pci iwldvm videodev media mac80211 i2c_i801 i2c_core sdhci mmc_core iwlwifi cfg80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm video backlight snd_timer snd
> [   12.380518] CPU: 1 PID: 3704 Comm: kdm_greet Not tainted 3.18.0-rc6-test-00001-gf5bc00c103ff #409
> [   12.380554] Hardware name: Dell Inc. Latitude E6320/09PHH9, BIOS A08 10/18/2011
> [   12.380584] task: ffff8801272bc2c0 ti: ffff8800bcaf0000 task.ti: ffff8800bcaf0000
> [   12.380614] RIP: 0010:[<ffffffff81125f09>]  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [   12.380653] RSP: 0018:ffff8800bcaf3d28  EFLAGS: 00010286
> [   12.380676] RAX: ffff8800bcb3e690 RBX: ffff8800bcb35e28 RCX: ffff8801272bcb60
> [   12.380706] RDX: ffff8800bcb38e70 RSI: 0000000000000001 RDI: ffff8800bcb38e70
> [   12.380734] RBP: ffff8800bcaf3d78 R08: 0000000000000000 R09: 0000000000000000
> [   12.380764] R10: 0000000000000000 R11: ffff8800bcb3e6a0 R12: ffff8800bcb3e680
> [   12.380793] R13: ffff8800bcb3e690 R14: ffff8800bcb38e70 R15: ffff8800bcb38e70
> [   12.380822] FS:  0000000000000000(0000) GS:ffff88012d440000(0000) knlGS:0000000000000000
> [   12.380855] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   12.380880] CR2: 00007fcd2603b0e8 CR3: 0000000001a11000 CR4: 00000000000407e0
> [   12.380908] Stack:
> [   12.380918]  ffff8801272e9dc0 ffff8800bcb35e38 ffff8800bcb35e38 ffff8800bcb3e680
> [   12.380953]  ffff8800bcaf3d78 ffff8800bcb35dc0 ffff8800bcaf3dd8 0000000000000000
> [   12.380989]  0000000000000000 ffff8800bcb35dc0 ffff8800bcaf3dc8 ffffffff81119e26
> [   12.381024] Call Trace:
> [   12.381038]  [<ffffffff81119e26>] free_pgtables+0x8e/0xcc
> [   12.381062]  [<ffffffff81121ac1>] exit_mmap+0x84/0x123
> [   12.381086]  [<ffffffff8103ff09>] mmput+0x5e/0xbb
> [   12.381107]  [<ffffffff81044d8c>] do_exit+0x39c/0x97e
> [   12.381131]  [<ffffffff810f49b4>] ? context_tracking_user_exit+0x79/0x116
> [   12.381160]  [<ffffffff8127f43a>] ? __this_cpu_preempt_check+0x13/0x15
> [   12.381188]  [<ffffffff810453f1>] do_group_exit+0x4c/0xc9
> [   12.381212]  [<ffffffff81045482>] SyS_exit_group+0x14/0x14
> [   12.381238]  [<ffffffff81524f52>] system_call_fastpath+0x12/0x17
> [   12.381262] Code: 32 f5 ff 49 8b 45 78 48 8b 18 4c 8d 60 f0 48 83 eb 10 4d 8d 6c 24 10 4c 3b 6d b8 74 3d 49 8b 7c 24 08 83 bf 98 00 00 00 00 74 02 <0f> 0b f0 ff 8f 88 00 00 00 74 1d 4c 89 ef e8 61 96 15 00 4c 89
> [   12.381445] RIP  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [   12.381473]  RSP <ffff8800bcaf3d28>
> [   12.386659] ---[ end trace 5761ee18fca12427 ]---
> [   12.386662] Fixing recursive fault but reboot is needed!
> [   13.158240] e1000e 0000:00:19.0: irq 25 for MSI/MSI-X
> [   13.259294] e1000e 0000:00:19.0: irq 25 for MSI/MSI-X
> [   13.259468] IPv6: ADDRCONF(NETDEV_UP): lan0: link is not ready
> [   16.790917] e1000e: lan0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> [   16.790957] IPv6: ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready
> [   18.846524] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
> [   18.846742] iwlwifi 0000:02:00.0: Radio type=0x0-0x3-0x1
> [   18.941594] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
> [   19.145595] e1000e: lan0 NIC Link is Down
> [   19.287399] e1000e 0000:00:19.0: irq 25 for MSI/MSI-X
> [   19.391325] e1000e 0000:00:19.0: irq 25 for MSI/MSI-X
> [   19.391475] IPv6: ADDRCONF(NETDEV_UP): lan0: link is not ready
> [   19.573640] e1000e: lan0 NIC Link is Down
> [   19.717813] e1000e 0000:00:19.0: irq 25 for MSI/MSI-X
> [   19.819729] e1000e 0000:00:19.0: irq 25 for MSI/MSI-X
> [   19.819883] IPv6: ADDRCONF(NETDEV_UP): lan0: link is not ready
> [   22.938849] e1000e: lan0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> [   22.938889] IPv6: ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready
> [   23.404027] ------------[ cut here ]------------
> [   23.404056] kernel BUG at mm/rmap.c:385!
> [   23.404074] invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
> [   23.404107] Modules linked in: i915 cfbfillrect cfbimgblt i2c_algo_bit fbcon bitblit softcursor cfbcopyarea font drm_kms_helper drm fb fbdev binfmt_misc fuse uvcvideo videobuf2_vmalloc videobuf2_memops arc4 videobuf2_core v4l2_common sdhci_pci iwldvm videodev media mac80211 i2c_i801 i2c_core sdhci mmc_core iwlwifi cfg80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm video backlight snd_timer snd
> [   23.404353] CPU: 1 PID: 4506 Comm: synaptikscfg Tainted: G      D        3.18.0-rc6-test-00001-gf5bc00c103ff #409
> [   23.404395] Hardware name: Dell Inc. Latitude E6320/09PHH9, BIOS A08 10/18/2011
> [   23.404425] task: ffff8800a337c2c0 ti: ffff88009f4ec000 task.ti: ffff88009f4ec000
> [   23.404455] RIP: 0010:[<ffffffff81125f09>]  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [   23.404494] RSP: 0018:ffff88009f4efd28  EFLAGS: 00010282
> [   23.405766] RAX: ffff88009f54d010 RBX: ffff88009f54c488 RCX: 0000000000000000
> [   23.407062] RDX: ffff88009f5a3a50 RSI: 0000000000000001 RDI: ffff88009f5a3a50
> [   23.408352] RBP: ffff88009f4efd78 R08: 0000000000000000 R09: 0000000000000000
> [   23.409597] R10: 0000000000000000 R11: ffff88009f54d020 R12: ffff88009f54d000
> [   23.410816] R13: ffff88009f54d010 R14: ffff88009f5a3a50 R15: ffff88009f5a3a50
> [   23.411998] FS:  0000000000000000(0000) GS:ffff88012d440000(0000) knlGS:0000000000000000
> [   23.413167] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   23.414320] CR2: 00007f7a855608f0 CR3: 00000000a328c000 CR4: 00000000000407e0
> [   23.415471] Stack:
> [   23.416603]  ffff8800a3390e00 ffff88009f54c498 ffff88009f54c498 ffff88009f54d000
> [   23.417747]  ffff88009f4efd78 ffff88009f54c420 ffff88009f4efdd8 0000000000000000
> [   23.418892]  0000000000000000 ffff88009f54c420 ffff88009f4efdc8 ffffffff81119e26
> [   23.420027] Call Trace:
> [   23.421153]  [<ffffffff81119e26>] free_pgtables+0x8e/0xcc
> [   23.422273]  [<ffffffff81121ac1>] exit_mmap+0x84/0x123
> [   23.423411]  [<ffffffff81044d48>] ? do_exit+0x358/0x97e
> [   23.424537]  [<ffffffff8103ff09>] mmput+0x5e/0xbb
> [   23.425665]  [<ffffffff81044d8c>] do_exit+0x39c/0x97e
> [   23.426766]  [<ffffffff810f49b4>] ? context_tracking_user_exit+0x79/0x116
> [   23.427866]  [<ffffffff8127f43a>] ? __this_cpu_preempt_check+0x13/0x15
> [   23.428962]  [<ffffffff810453f1>] do_group_exit+0x4c/0xc9
> [   23.430064]  [<ffffffff81045482>] SyS_exit_group+0x14/0x14
> [   23.431162]  [<ffffffff81524f52>] system_call_fastpath+0x12/0x17
> [   23.432262] Code: 32 f5 ff 49 8b 45 78 48 8b 18 4c 8d 60 f0 48 83 eb 10 4d 8d 6c 24 10 4c 3b 6d b8 74 3d 49 8b 7c 24 08 83 bf 98 00 00 00 00 74 02 <0f> 0b f0 ff 8f 88 00 00 00 74 1d 4c 89 ef e8 61 96 15 00 4c 89
> [   23.434722] RIP  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [   23.435924]  RSP <ffff88009f4efd28>
> [   23.441996] ---[ end trace 5761ee18fca12428 ]---
> [   23.442001] Fixing recursive fault but reboot is needed!
> [  838.179454] ------------[ cut here ]------------
> [  838.180658] kernel BUG at mm/rmap.c:385!
> [  838.181843] invalid opcode: 0000 [#3] PREEMPT SMP DEBUG_PAGEALLOC
> [  838.183046] Modules linked in: i915 cfbfillrect cfbimgblt i2c_algo_bit fbcon bitblit softcursor cfbcopyarea font drm_kms_helper drm fb fbdev binfmt_misc fuse uvcvideo videobuf2_vmalloc videobuf2_memops arc4 videobuf2_core v4l2_common sdhci_pci iwldvm videodev media mac80211 i2c_i801 i2c_core sdhci mmc_core iwlwifi cfg80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm video backlight snd_timer snd
> [  838.186983] CPU: 1 PID: 6643 Comm: colord-sane Tainted: G      D        3.18.0-rc6-test-00001-gf5bc00c103ff #409
> [  838.188240] Hardware name: Dell Inc. Latitude E6320/09PHH9, BIOS A08 10/18/2011
> [  838.189503] task: ffff8800c4fd8000 ti: ffff880079c6c000 task.ti: ffff880079c6c000
> [  838.190765] RIP: 0010:[<ffffffff81125f09>]  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [  838.192045] RSP: 0018:ffff880079c6fb68  EFLAGS: 00010286
> [  838.193324] RAX: ffff8800c5a70150 RBX: ffff8800a6fd5748 RCX: 0000000000000000
> [  838.194616] RDX: ffff8800a5379840 RSI: 0000000000000001 RDI: ffff8800a5379840
> [  838.195879] RBP: ffff880079c6fbb8 R08: 0000000000000000 R09: 0000000000000000
> [  838.197100] R10: 0000000000000000 R11: ffff8800c5a70160 R12: ffff8800c5a70140
> [  838.198289] R13: ffff8800c5a70150 R14: ffff8800a5379840 R15: ffff8800a5379840
> [  838.199448] FS:  0000000000000000(0000) GS:ffff88012d440000(0000) knlGS:0000000000000000
> [  838.200604] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  838.201753] CR2: 00007fdfd692cde8 CR3: 0000000079d0d000 CR4: 00000000000407e0
> [  838.202902] Stack:
> [  838.204029]  ffff88011e6fc540 ffff8800a6fd5758 ffff8800a6fd5758 ffff8800c5a70140
> [  838.205180]  ffff880079c6fbb8 ffff8800a6fd56e0 ffff880079c6fc18 0000000000000000
> [  838.206328]  0000000000000000 ffff8800a6fd56e0 ffff880079c6fc08 ffffffff81119e26
> [  838.207477] Call Trace:
> [  838.208614]  [<ffffffff81119e26>] free_pgtables+0x8e/0xcc
> [  838.209762]  [<ffffffff81121ac1>] exit_mmap+0x84/0x123
> [  838.210897]  [<ffffffff81044d48>] ? do_exit+0x358/0x97e
> [  838.212020]  [<ffffffff8103ff09>] mmput+0x5e/0xbb
> [  838.213132]  [<ffffffff81044d8c>] do_exit+0x39c/0x97e
> [  838.214232]  [<ffffffff8104ea16>] ? get_signal+0xdb/0x68a
> [  838.215324]  [<ffffffff8115de6d>] ? poll_select_copy_remaining+0xfe/0xfe
> [  838.216420]  [<ffffffff810453f1>] do_group_exit+0x4c/0xc9
> [  838.217521]  [<ffffffff8104ef82>] get_signal+0x647/0x68a
> [  838.218612]  [<ffffffff810f48bd>] ? context_tracking_user_enter+0xdb/0x159
> [  838.219705]  [<ffffffff8100228f>] do_signal+0x28/0x657
> [  838.220796]  [<ffffffff810c1e10>] ? __acct_update_integrals+0xbf/0xd4
> [  838.221894]  [<ffffffff81063e43>] ? preempt_count_sub+0xcd/0xdb
> [  838.222998]  [<ffffffff8106972e>] ? vtime_account_user+0x88/0x95
> [  838.224105]  [<ffffffff815243a3>] ? _raw_spin_unlock+0x32/0x47
> [  838.225205]  [<ffffffff810f49b4>] ? context_tracking_user_exit+0x79/0x116
> [  838.226308]  [<ffffffff810f49b4>] ? context_tracking_user_exit+0x79/0x116
> [  838.227401]  [<ffffffff810028fd>] do_notify_resume+0x3f/0x94
> [  838.228495]  [<ffffffff81525218>] int_signal+0x12/0x17
> [  838.229581] Code: 32 f5 ff 49 8b 45 78 48 8b 18 4c 8d 60 f0 48 83 eb 10 4d 8d 6c 24 10 4c 3b 6d b8 74 3d 49 8b 7c 24 08 83 bf 98 00 00 00 00 74 02 <0f> 0b f0 ff 8f 88 00 00 00 74 1d 4c 89 ef e8 61 96 15 00 4c 89
> [  838.231909] RIP  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [  838.233003]  RSP <ffff880079c6fb68>
> [  838.234248] ---[ end trace 5761ee18fca12429 ]---
> [  838.234251] Fixing recursive fault but reboot is needed!
> [ 1806.784267] ------------[ cut here ]------------
> [ 1806.785322] kernel BUG at mm/rmap.c:385!
> [ 1806.786361] invalid opcode: 0000 [#4] PREEMPT SMP DEBUG_PAGEALLOC
> [ 1806.787397] Modules linked in: i915 cfbfillrect cfbimgblt i2c_algo_bit fbcon bitblit softcursor cfbcopyarea font drm_kms_helper drm fb fbdev binfmt_misc fuse uvcvideo videobuf2_vmalloc videobuf2_memops arc4 videobuf2_core v4l2_common sdhci_pci iwldvm videodev media mac80211 i2c_i801 i2c_core sdhci mmc_core iwlwifi cfg80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm video backlight snd_timer snd
> [ 1806.790682] CPU: 1 PID: 8135 Comm: DNS Resolver #7 Tainted: G      D        3.18.0-rc6-test-00001-gf5bc00c103ff #409
> [ 1806.791728] Hardware name: Dell Inc. Latitude E6320/09PHH9, BIOS A08 10/18/2011
> [ 1806.792779] task: ffff8800b3d40000 ti: ffff880079e34000 task.ti: ffff880079e34000
> [ 1806.793816] RIP: 0010:[<ffffffff81125f09>]  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [ 1806.794863] RSP: 0018:ffff880079e37d38  EFLAGS: 00010282
> [ 1806.795894] RAX: ffff8800b508d790 RBX: ffff8800bcaa4e28 RCX: 0000000000000000
> [ 1806.796948] RDX: ffff880124ce0f20 RSI: 0000000000000001 RDI: ffff880124ce0f20
> [ 1806.798011] RBP: ffff880079e37d88 R08: 0000000000000000 R09: 0000000000000000
> [ 1806.799048] R10: 00007fc2827f9db0 R11: ffff8800b508d7a0 R12: ffff8800b508d780
> [ 1806.800105] R13: ffff8800b508d790 R14: ffff880124ce0f20 R15: ffff880124ce0f20
> [ 1806.801143] FS:  00007fc2827fa700(0000) GS:ffff88012d440000(0000) knlGS:0000000000000000
> [ 1806.802206] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1806.803244] CR2: 00007fc2c6b87000 CR3: 00000000a3063000 CR4: 00000000000407e0
> [ 1806.804305] Stack:
> [ 1806.805329]  00007fc280754000 ffff8800bcaa4e38 ffff8800bcaa4e38 ffff8800b508d780
> [ 1806.806382]  0000000081098bfb ffff8800bcaa4dc0 ffff880079e37df8 00007fc27ff00000
> [ 1806.807467]  00007fc280a00000 ffff8800bcaa4dc0 ffff880079e37dd8 ffffffff81119e26
> [ 1806.808536] Call Trace:
> [ 1806.809570]  [<ffffffff81119e26>] free_pgtables+0x8e/0xcc
> [ 1806.810617]  [<ffffffff8111fe4c>] unmap_region+0xc8/0xec
> [ 1806.811658]  [<ffffffff81270329>] ? __rb_erase_color+0x122/0x1f9
> [ 1806.812724]  [<ffffffff8112192b>] do_munmap+0x275/0x2f7
> [ 1806.813792]  [<ffffffff811219f5>] vm_munmap+0x48/0x61
> [ 1806.814841]  [<ffffffff81121a34>] SyS_munmap+0x26/0x2f
> [ 1806.815884]  [<ffffffff81524f52>] system_call_fastpath+0x12/0x17
> [ 1806.816951] Code: 32 f5 ff 49 8b 45 78 48 8b 18 4c 8d 60 f0 48 83 eb 10 4d 8d 6c 24 10 4c 3b 6d b8 74 3d 49 8b 7c 24 08 83 bf 98 00 00 00 00 74 02 <0f> 0b f0 ff 8f 88 00 00 00 74 1d 4c 89 ef e8 61 96 15 00 4c 89
> [ 1806.819300] RIP  [<ffffffff81125f09>] unlink_anon_vmas+0x12b/0x169
> [ 1806.820457]  RSP <ffff880079e37d38>
> [ 1806.822068] ---[ end trace 5761ee18fca1242a ]---
> --
> Michal Hocko
> SUSE Labs

[-- Attachment #2: mm-prevent-endless-growth-of-anon_vma-hierarchy-v2 --]
[-- Type: application/octet-stream, Size: 5520 bytes --]

mm: prevent endless growth of anon_vma hierarchy

From: Konstantin Khlebnikov <koct9i@gmail.com>

Constantly forking task causes unlimited grow of anon_vma chain.
Each next child allocate new level of anon_vmas and links vmas to all
previous levels because it inherits pages from them. None of anon_vmas
cannot be freed because there might be pages which points to them.

This patch adds heuristic which decides to reuse existing anon_vma instead
of forking new one. It counts vmas and direct descendants for each anon_vma.
Anon_vma with degree lower than two will be reused at next fork.
As a result each anon_vma has either alive vma or at least two descendants,
endless chains are no longer possible and count of anon_vmas is no more than
two times more than count of vmas.

v2: update degree in anon_vma_prepare for merged anon_vma

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu
---
 include/linux/rmap.h |   16 ++++++++++++++++
 mm/rmap.c            |   30 +++++++++++++++++++++++++++++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index c0c2bce..b1d140c 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -45,6 +45,22 @@ struct anon_vma {
 	 * mm_take_all_locks() (mm_all_locks_mutex).
 	 */
 	struct rb_root rb_root;	/* Interval tree of private "related" vmas */
+
+	/*
+	 * Count of child anon_vmas and VMAs which points to this anon_vma.
+	 *
+	 * This counter is used for making decision about reusing old anon_vma
+	 * instead of forking new one. It allows to detect anon_vmas which have
+	 * just one direct descendant and no vmas. Reusing such anon_vma not
+	 * leads to significant preformance regression but prevents degradation
+	 * of anon_vma hierarchy to endless linear chain.
+	 *
+	 * Root anon_vma is never reused because it is its own parent and it has
+	 * at leat one vma or child, thus at fork it's degree is at least 2.
+	 */
+	unsigned degree;
+
+	struct anon_vma *parent;	/* Parent of this anon_vma */
 };
 
 /*
diff --git a/mm/rmap.c b/mm/rmap.c
index 19886fb..df5c44e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -72,6 +72,8 @@ static inline struct anon_vma *anon_vma_alloc(void)
 	anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
 	if (anon_vma) {
 		atomic_set(&anon_vma->refcount, 1);
+		anon_vma->degree = 1;	/* Reference for first vma */
+		anon_vma->parent = anon_vma;
 		/*
 		 * Initialise the anon_vma root to point to itself. If called
 		 * from fork, the root will be reset to the parents anon_vma.
@@ -188,6 +190,8 @@ int anon_vma_prepare(struct vm_area_struct *vma)
 		if (likely(!vma->anon_vma)) {
 			vma->anon_vma = anon_vma;
 			anon_vma_chain_link(vma, avc, anon_vma);
+			/* vma link if merged or child link for new root */
+			anon_vma->degree++;
 			allocated = NULL;
 			avc = NULL;
 		}
@@ -256,7 +260,17 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
 		anon_vma = pavc->anon_vma;
 		root = lock_anon_vma_root(root, anon_vma);
 		anon_vma_chain_link(dst, avc, anon_vma);
+
+		/*
+		 * Reuse existing anon_vma if its degree lower than two,
+		 * that means it has no vma and just one anon_vma child.
+		 */
+		if (!dst->anon_vma && anon_vma != src->anon_vma &&
+				anon_vma->degree < 2)
+			dst->anon_vma = anon_vma;
 	}
+	if (dst->anon_vma)
+		dst->anon_vma->degree++;
 	unlock_anon_vma_root(root);
 	return 0;
 
@@ -279,6 +293,9 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	if (!pvma->anon_vma)
 		return 0;
 
+	/* Drop inherited anon_vma, we'll reuse old one or allocate new. */
+	vma->anon_vma = NULL;
+
 	/*
 	 * First, attach the new VMA to the parent VMA's anon_vmas,
 	 * so rmap can find non-COWed pages in child processes.
@@ -286,6 +303,10 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	if (anon_vma_clone(vma, pvma))
 		return -ENOMEM;
 
+	/* An old anon_vma has been reused. */
+	if (vma->anon_vma)
+		return 0;
+
 	/* Then add our own anon_vma. */
 	anon_vma = anon_vma_alloc();
 	if (!anon_vma)
@@ -299,6 +320,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	 * lock any of the anon_vmas in this anon_vma tree.
 	 */
 	anon_vma->root = pvma->anon_vma->root;
+	anon_vma->parent = pvma->anon_vma;
 	/*
 	 * With refcounts, an anon_vma can stay around longer than the
 	 * process it belongs to. The root anon_vma needs to be pinned until
@@ -309,6 +331,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	vma->anon_vma = anon_vma;
 	anon_vma_lock_write(anon_vma);
 	anon_vma_chain_link(vma, avc, anon_vma);
+	anon_vma->parent->degree++;
 	anon_vma_unlock_write(anon_vma);
 
 	return 0;
@@ -339,12 +362,16 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
 		 * Leave empty anon_vmas on the list - we'll need
 		 * to free them outside the lock.
 		 */
-		if (RB_EMPTY_ROOT(&anon_vma->rb_root))
+		if (RB_EMPTY_ROOT(&anon_vma->rb_root)) {
+			anon_vma->parent->degree--;
 			continue;
+		}
 
 		list_del(&avc->same_vma);
 		anon_vma_chain_free(avc);
 	}
+	if (vma->anon_vma)
+		vma->anon_vma->degree--;
 	unlock_anon_vma_root(root);
 
 	/*
@@ -355,6 +382,7 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
 	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
 		struct anon_vma *anon_vma = avc->anon_vma;
 
+		BUG_ON(anon_vma->degree);
 		put_anon_vma(anon_vma);
 
 		list_del(&avc->same_vma);

  reply	other threads:[~2014-11-25 12:13 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-16  2:46 Repeated fork() causes SLAB to grow without bound Daniel Forrest
2012-08-16 18:58 ` Rik van Riel
2012-08-16 18:58   ` Rik van Riel
2012-08-18  0:03   ` Daniel Forrest
2012-08-18  0:03     ` Daniel Forrest
2012-08-18  3:46     ` Rik van Riel
2012-08-18  3:46       ` Rik van Riel
2012-08-18  4:07       ` Daniel Forrest
2012-08-18  4:07         ` Daniel Forrest
2012-08-18  4:10         ` Rik van Riel
2012-08-18  4:10           ` Rik van Riel
2012-08-20  8:00       ` Hugh Dickins
2012-08-20  8:00         ` Hugh Dickins
2012-08-20  9:39         ` Michel Lespinasse
2012-08-20  9:39           ` Michel Lespinasse
2012-08-20 11:11           ` Andi Kleen
2012-08-20 11:11             ` Andi Kleen
2012-08-20 11:17           ` Rik van Riel
2012-08-20 11:17             ` Rik van Riel
2012-08-20 11:53             ` Michel Lespinasse
2012-08-20 11:53               ` Michel Lespinasse
2012-08-20 19:11               ` Michel Lespinasse
2012-08-20 19:11                 ` Michel Lespinasse
2012-08-22  3:20           ` [RFC PATCH] " Michel Lespinasse
2012-08-22  3:20             ` Michel Lespinasse
2012-08-22  3:29             ` Rik van Riel
2012-08-22  3:29               ` Rik van Riel
2013-06-03 19:50               ` Daniel Forrest
2013-06-03 19:50                 ` Daniel Forrest
2013-06-04 10:37                 ` Rik van Riel
2013-06-04 10:37                   ` Rik van Riel
2013-06-05 14:02                   ` Andrea Arcangeli
2013-06-05 14:02                     ` Andrea Arcangeli
2014-11-14 16:30                 ` [PATCH] " Daniel Forrest
2014-11-14 16:30                   ` Daniel Forrest
2014-11-18  0:02                   ` Andrew Morton
2014-11-18  0:02                     ` Andrew Morton
2014-11-18  1:41                     ` Daniel Forrest
2014-11-18  1:41                       ` Daniel Forrest
2014-11-18  2:41                       ` Rik van Riel
2014-11-18  2:41                         ` Rik van Riel
2014-11-18 20:19                         ` Andrew Morton
2014-11-18 20:19                           ` Andrew Morton
2014-11-18 22:15                           ` Konstantin Khlebnikov
2014-11-18 22:15                             ` Konstantin Khlebnikov
2014-11-18 23:02                             ` Konstantin Khlebnikov
2014-11-18 23:50                               ` Vlastimil Babka
2014-11-18 23:50                                 ` Vlastimil Babka
2014-11-19 14:36                                 ` Konstantin Khlebnikov
2014-11-19 14:36                                   ` Konstantin Khlebnikov
2014-11-19 16:09                                   ` Vlastimil Babka
2014-11-19 16:09                                     ` Vlastimil Babka
2014-11-19 16:58                                     ` Konstantin Khlebnikov
2014-11-19 16:58                                       ` Konstantin Khlebnikov
2014-11-19 23:14                                       ` Michel Lespinasse
2014-11-19 23:14                                         ` Michel Lespinasse
2014-11-20 14:42                                         ` Konstantin Khlebnikov
2014-11-20 14:42                                           ` Konstantin Khlebnikov
2014-11-20 14:50                                           ` Rik van Riel
2014-11-20 14:50                                             ` Rik van Riel
2014-11-20 15:03                                             ` Konstantin Khlebnikov
2014-11-20 15:03                                               ` Konstantin Khlebnikov
2014-11-24  7:09                                               ` Konstantin Khlebnikov
2014-11-25 10:59                                                 ` Michal Hocko
2014-11-25 10:59                                                   ` Michal Hocko
2014-11-25 12:13                                                   ` Konstantin Khlebnikov [this message]
2014-11-25 15:00                                                     ` Michal Hocko
2014-11-25 15:00                                                       ` Michal Hocko
2014-11-26 17:35                                                       ` Michal Hocko
2014-11-26 17:35                                                         ` Michal Hocko
2014-12-05 15:44                                                         ` Jerome Marchand
2014-11-20 15:27                                           ` Michel Lespinasse
2014-11-20 15:27                                             ` Michel Lespinasse
2014-11-19  2:48                           ` Rik van Riel
2014-11-19  2:48                             ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALYGNiPZmf4Y1_vX_FaiALKp-BPvct7fAiaPEjnDGnVx9paS9w@mail.gmail.com \
    --to=koct9i@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=tim@edgecast.com \
    --cc=vbabka@suse.cz \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.