From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B531AC07E9D for ; Tue, 27 Sep 2022 01:52:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229895AbiI0BwK (ORCPT ); Mon, 26 Sep 2022 21:52:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230006AbiI0BwE (ORCPT ); Mon, 26 Sep 2022 21:52:04 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DFE671BC6 for ; Mon, 26 Sep 2022 18:52:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664243523; x=1695779523; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=Dk6ndrlNBYI60TKR2ZwIH8QqY848CDVYdIWHSrUiXuI=; b=ZtZNksmJuZFFJexM6+SVAKKsCUF9YWyAq5inKbnn5DUn/gZSKSa9CWik XbRTUiNTy2wqkA7s3jd8QwvwTPX/ZRnJRa+/azLVTTlb2mXnfY5ArHjvP UI36ITqhraJIh+yy5dlP9SVk0VM1fy+heaBUu7owmGDfgJdinhrEryb4B jxLf6SbwC1mH61xbis2sLOBND8yCiURT2E3IqSE2fZcZkW+5rLQLYSIv4 KzsfqGKKiEzINCJR0uG1jn49+lnGRLiUxPfuXHhJ97CiGqVh7AL06BCbO 3ra1WInYVncw5j6LWQg5Aa9j4zDawLZkxBatAPpgRgIvbYN2AlGVdBlog A==; X-IronPort-AV: E=McAfee;i="6500,9779,10482"; a="363020809" X-IronPort-AV: E=Sophos;i="5.93,347,1654585200"; d="scan'208";a="363020809" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Sep 2022 18:52:02 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10482"; a="725309544" X-IronPort-AV: E=Sophos;i="5.93,347,1654585200"; d="scan'208";a="725309544" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Sep 2022 18:52:00 -0700 From: "Huang, Ying" To: Alistair Popple Cc: Yang Shi , , , Andrew Morton , Zi Yan , Baolin Wang , Oscar Salvador , Matthew Wilcox Subject: Re: [RFC 2/6] mm/migrate_pages: split unmap_and_move() to _unmap() and _move() References: <20220921060616.73086-1-ying.huang@intel.com> <20220921060616.73086-3-ying.huang@intel.com> <87o7v2lbn4.fsf@nvdebian.thelocal> <87fsgdllmb.fsf@nvdebian.thelocal> Date: Tue, 27 Sep 2022 09:51:21 +0800 In-Reply-To: <87fsgdllmb.fsf@nvdebian.thelocal> (Alistair Popple's message of "Tue, 27 Sep 2022 10:02:33 +1000") Message-ID: <87ill937qe.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Alistair Popple writes: > Yang Shi writes: > >> On Mon, Sep 26, 2022 at 2:37 AM Alistair Popple wrote: >>> >>> >>> Huang Ying writes: >>> >>> > This is a preparation patch to batch the page unmapping and moving >>> > for the normal pages and THP. >>> > >>> > In this patch, unmap_and_move() is split to migrate_page_unmap() and >>> > migrate_page_move(). So, we can batch _unmap() and _move() in >>> > different loops later. To pass some information between unmap and >>> > move, the original unused newpage->mapping and newpage->private are >>> > used. >>> >>> This looks like it could cause a deadlock between two threads migrating >>> the same pages if force == true && mode != MIGRATE_ASYNC as >>> migrate_page_unmap() will call lock_page() while holding the lock on >>> other pages in the list. Therefore the two threads could deadlock if the >>> pages are in a different order. >> >> It seems unlikely to me since the page has to be isolated from lru >> before migration. The isolating from lru is atomic, so the two threads >> unlikely see the same pages on both lists. > > Oh thanks! That is a good point and I agree since lru isolation is > atomic the two threads won't see the same pages. migrate_vma_setup() > does LRU isolation after locking the page which is why the potential > exists there. We could potentially switch that around but given > ZONE_DEVICE pages aren't on an lru it wouldn't help much. > >> But there might be other cases which may incur deadlock, for example, >> filesystem writeback IIUC. Some filesystems may lock a bunch of pages >> then write them back in a batch. The same pages may be on the >> migration list and they are also dirty and seen by writeback. I'm not >> sure whether I miss something that could prevent such a deadlock from >> happening. > > I'm not overly familiar with that area but I would assume any filesystem > code doing this would already have to deal with deadlock potential. Thank you very much for pointing this out. I think the deadlock is a real issue. Anyway, we shouldn't forbid other places in kernel to lock 2 pages at the same time. The simplest solution is to batch page migration only if mode == MIGRATE_ASYNC. Then we may consider to fall back to non-batch mode if mode != MIGRATE_ASYNC and trylock page fails. Best Regards, Huang, Ying [snip]