From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2A48C4360F for ; Wed, 13 Mar 2019 00:39:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A42FA217D9 for ; Wed, 13 Mar 2019 00:39:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="dU+cus5J" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726502AbfCMAjW (ORCPT ); Tue, 12 Mar 2019 20:39:22 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:5306 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726292AbfCMAjV (ORCPT ); Tue, 12 Mar 2019 20:39:21 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 12 Mar 2019 17:39:21 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Tue, 12 Mar 2019 17:39:19 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Tue, 12 Mar 2019 17:39:19 -0700 Received: from [10.2.175.16] (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 13 Mar 2019 00:39:18 +0000 Subject: Re: [PATCH v3 1/1] mm: introduce put_user_page*(), placeholder versions To: Ira Weiny , CC: Andrew Morton , , Al Viro , Christian Benvenuti , Christoph Hellwig , Christopher Lameter , Dan Williams , Dave Chinner , Dennis Dalessandro , Doug Ledford , Jan Kara , Jason Gunthorpe , Jerome Glisse , Matthew Wilcox , Michal Hocko , Mike Rapoport , Mike Marciniszyn , Ralph Campbell , Tom Talpey , LKML , References: <20190306235455.26348-1-jhubbard@nvidia.com> <20190306235455.26348-2-jhubbard@nvidia.com> <20190312153033.GG1119@iweiny-DESK2.sc.intel.com> X-Nvconfidentiality: public From: John Hubbard Message-ID: Date: Tue, 12 Mar 2019 17:38:55 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.3 MIME-Version: 1.0 In-Reply-To: <20190312153033.GG1119@iweiny-DESK2.sc.intel.com> X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL108.nvidia.com (172.18.146.13) To HQMAIL101.nvidia.com (172.20.187.10) Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1552437561; bh=+hX026rpYJ/LrRtNbb8F5ij7BYrG6XmL8SylXJLPkuw=; h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=dU+cus5JdTgOMNpreNlFZkqZG4Mx1Zi1iZA0fSrVGprBOYkQQyh9q2SG0gSInk2+5 idxsPpJIsYB1A8jRSiVS1VL5XQwAvJeODh3TK3Twqi5i8Ocs3dFhgAx5VRZsMp9Nns zcZanWOjt6fOq6VB0pBU8/PHrBqRzYG/kCqmg2DTeQRBRkh8PsgMukGg0pIhdLyO55 nm4hQtV5MWqMv0p67E1MlXk18NSuVcdblXQFMeqDZffAfwwVM89yzFHGuzFrl+glGK zsnWvN9yQJOY7EnzdX/v+4dW1kWIeoI1aSMCV81RFwax7jeeEdrlfEeDOtJG+YkF4U IZSkBKy5cNZjQ== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/12/19 8:30 AM, Ira Weiny wrote: > On Wed, Mar 06, 2019 at 03:54:55PM -0800, john.hubbard@gmail.com wrote: >> From: John Hubbard >> >> Introduces put_user_page(), which simply calls put_page(). >> This provides a way to update all get_user_pages*() callers, >> so that they call put_user_page(), instead of put_page(). > > So I've been running with these patches for a while but today while ramping up > my testing I hit the following: > > [ 1355.557819] ------------[ cut here ]------------ > [ 1355.563436] get_user_pages pin count overflowed Hi Ira, Thanks for reporting this. That overflow, at face value, means that we've used more than the 22 bits worth of gup pin counts, so about 4 million pins of the same page... > [ 1355.563446] WARNING: CPU: 1 PID: 1740 at mm/gup.c:73 get_gup_pin_page+0xa5/0xb0 > [ 1355.577391] Modules linked in: ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transpo > rt_srp ext4 mbcache jbd2 mlx4_ib opa_vnic rpcrdma sunrpc rdma_ucm ib_iser rdma_cm ib_umad iw_cm libiscs > i ib_ipoib scsi_transport_iscsi ib_cm sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbyp > ass snd_hda_codec_realtek ib_uverbs snd_hda_codec_generic crct10dif_pclmul ledtrig_audio snd_hda_intel > crc32_pclmul snd_hda_codec snd_hda_core ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel crypto_simd s > nd_timer ib_core cryptd snd glue_helper dax_pmem soundcore nd_pmem ipmi_si device_dax nd_btt ioatdma nd > _e820 ipmi_devintf ipmi_msghandler iTCO_wdt i2c_i801 iTCO_vendor_support libnvdimm pcspkr lpc_ich mei_m > e mei mfd_core wmi pcc_cpufreq acpi_cpufreq sch_fq_codel xfs libcrc32c mlx4_en sr_mod cdrom sd_mod mgag > 200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlx4_core ttm crc32c_intel igb isci ah > ci dca libsas firewire_ohci drm i2c_algo_bit libahci scsi_transport_sas > [ 1355.577429] firewire_core crc_itu_t i2c_core libata dm_mod [last unloaded: rdmavt] > [ 1355.686703] CPU: 1 PID: 1740 Comm: reg-mr Not tainted 5.0.0+ #10 > [ 1355.693851] Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.02.04.0003.1023201411 > 38 10/23/2014 > [ 1355.705750] RIP: 0010:get_gup_pin_page+0xa5/0xb0 > [ 1355.711348] Code: e8 40 02 ff ff 80 3d ba a2 fb 00 00 b8 b5 ff ff ff 75 bb 48 c7 c7 48 0a e9 81 89 4 > 4 24 04 c6 05 a1 a2 fb 00 01 e8 35 63 e8 ff <0f> 0b 8b 44 24 04 eb 9c 0f 1f 00 66 66 66 66 90 41 57 49 > bf 00 00 > [ 1355.733244] RSP: 0018:ffffc90005a23b30 EFLAGS: 00010286 > [ 1355.739536] RAX: 0000000000000000 RBX: ffffea0014220000 RCX: 0000000000000000 > [ 1355.748005] RDX: 0000000000000003 RSI: ffffffff827d94a3 RDI: 0000000000000246 > [ 1355.756453] RBP: ffffea0014220000 R08: 0000000000000002 R09: 0000000000022400 > [ 1355.764907] R10: 0009ccf0ad0c4203 R11: 0000000000000001 R12: 0000000000010207 > [ 1355.773369] R13: ffff8884130b7040 R14: fff0000000000fff R15: 000fffffffe00000 > [ 1355.781836] FS: 00007f2680d0d740(0000) GS:ffff88842e840000(0000) knlGS:0000000000000000 > [ 1355.791384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1355.798319] CR2: 0000000000589000 CR3: 000000040b05e004 CR4: 00000000000606e0 > [ 1355.806809] Call Trace: > [ 1355.810078] follow_page_pte+0x4f3/0x5c0 > [ 1355.814987] __get_user_pages+0x1eb/0x730 > [ 1355.820020] get_user_pages+0x3e/0x50 > [ 1355.824657] ib_umem_get+0x283/0x500 [ib_uverbs] > [ 1355.830340] ? _cond_resched+0x15/0x30 > [ 1355.835065] mlx4_ib_reg_user_mr+0x75/0x1e0 [mlx4_ib] > [ 1355.841235] ib_uverbs_reg_mr+0x10c/0x220 [ib_uverbs] > [ 1355.847400] ib_uverbs_write+0x2f9/0x4d0 [ib_uverbs] > [ 1355.853473] __vfs_write+0x36/0x1b0 > [ 1355.857904] ? selinux_file_permission+0xf0/0x130 > [ 1355.863702] ? security_file_permission+0x2e/0xe0 > [ 1355.869503] vfs_write+0xa5/0x1a0 > [ 1355.873751] ksys_write+0x4f/0xb0 > [ 1355.878009] do_syscall_64+0x5b/0x180 > [ 1355.882656] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 1355.888862] RIP: 0033:0x7f2680ec3ed8 > [ 1355.893420] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 45 78 0 > d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 > 89 d4 55 > [ 1355.915573] RSP: 002b:00007ffe65d50bc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > [ 1355.924621] RAX: ffffffffffffffda RBX: 00007ffe65d50c74 RCX: 00007f2680ec3ed8 > [ 1355.933195] RDX: 0000000000000030 RSI: 00007ffe65d50c80 RDI: 0000000000000003 > [ 1355.941760] RBP: 0000000000000030 R08: 0000000000000007 R09: 0000000000581260 > [ 1355.950326] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000581930 > [ 1355.958885] R13: 000000000000000c R14: 0000000000581260 R15: 0000000000000000 > [ 1355.967430] ---[ end trace bc771ac6189977a2 ]--- > > > I'm not sure what I did to do this and I'm going to work on a reproducer. At > the time of the Warning I only had 1 GUP user?!?!?!?! If there is a get_user_pages() call that lacks a corresponding put_user_pages() call, then the count could start working its way up, and up. Either that, or a bug in my patches here, could cause this. The basic counting works correctly in fio runs on an NVMe driver with Direct IO, when I dump out `cat /proc/vmstat | grep gup`: the counts match up, but that is a simple test. One way to force a faster repro is to increase the GUP_PIN_COUNTING_BIAS, so that the gup pin count runs into the max much sooner. I'd really love to create a test setup that would generate this failure, so anything you discover on how to repro (including what hardware is required--I'm sure I can scrounge up some IB gear in a pinch) is of great interest. Also, I'm just now starting on the DEBUG_USER_PAGE_REFERENCES idea that Jerome, Jan, and Dan floated some months ago. It's clearly a prerequisite to converting the call sites properly--just our relatively small IB driver is showing that. This feature will provide a different mapping of the struct pages, if get them via get_user_pages(). That will allow easily asserting that put_user_page() and put_page() are not swapped, in either direction. > > I'm not using ODP, so I don't think the changes we have discussed there are a > problem. > > Ira > thanks, -- John Hubbard NVIDIA