From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S976419AbdDXSCM (ORCPT ); Mon, 24 Apr 2017 14:02:12 -0400 Received: from mail-wm0-f47.google.com ([74.125.82.47]:37678 "EHLO mail-wm0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S972927AbdDXSCB (ORCPT ); Mon, 24 Apr 2017 14:02:01 -0400 Date: Mon, 24 Apr 2017 21:01:58 +0300 From: "Kirill A. Shutemov" To: Dan Williams Cc: Linux MM , Catalin Marinas , "Aneesh Kumar K.V" , Steve Capper , Thomas Gleixner , Peter Zijlstra , Linux Kernel Mailing List , Ingo Molnar , Andrew Morton , "Kirill A. Shutemov" , "H. Peter Anvin" , Dave Hansen , Borislav Petkov , Rik van Riel , Dann Frazier , Linus Torvalds , Michal Hocko , linux-tip-commits@vger.kernel.org Subject: Re: get_zone_device_page() in get_page() and page_cache_get_speculative() Message-ID: <20170424180158.y26m3kgzhpmawbhg@node.shutemov.name> References: <20170423233125.nehmgtzldgi25niy@node.shutemov.name> <20170424173021.ayj3hslvfrrgrie7@node.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170306 (1.8.0) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 24, 2017 at 10:47:43AM -0700, Dan Williams wrote: > On Mon, Apr 24, 2017 at 10:30 AM, Kirill A. Shutemov > >> >> [ 35.423841] WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 > >> >> percpu_ref_switch_to_atomic_rcu+0x1f5/0x200 > >> > > >> > Okay, I've tracked it down. The issue is triggered by replacment > >> > get_page() with page_cache_get_speculative(). > >> > > >> > page_cache_get_speculative() doesn't have get_zone_device_page(). :-| > >> > > >> > And I think it's your bug, Dan: it's wrong to have > >> > get_/put_zone_device_page() in get_/put_page(). I must be handled by > >> > page_ref_* machinery to catch all cases where we manipulate with page > >> > refcount. > >> > >> The page_ref conversion landed in 4.6 *after* the ZONE_DEVICE > >> implementation that landed in 4.5, so there was a missed conversion of > >> the zone-device reference counting to page_ref. > > > > Fair enough. > > > > But get_page_unless_zero() definitely predates ZONE_DEVICE. :) > > > > It does, but that's deliberate. A ZONE_DEVICE page never has a zero > reference count, it's always owned by the device, never by the page > allocator. ZONE_DEVICE overrides the ->lru list_head to store private > device information and we rely on the behavior that a non-zero > reference means the page is not added to any lru or page cache list. So, what do you propose? Use get_page() instead of page_cache_get_speculative() in GUP_fast() if the page belong to zone device? I don't like it. This situation, when we only can use subset of helpers to manipulate page refcount creates situation waiting to explode. I think it's still better to do it on page_ref_* level. BTW, why do we need to pin pgmap from get_page() in first place? I don't have enough background in ZONE_DEVICE. -- Kirill A. Shutemov