From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACEE0C433E0 for ; Wed, 13 Jan 2021 10:05:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5FD4623359 for ; Wed, 13 Jan 2021 10:05:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727160AbhAMKFg (ORCPT ); Wed, 13 Jan 2021 05:05:36 -0500 Received: from out30-43.freemail.mail.aliyun.com ([115.124.30.43]:42756 "EHLO out30-43.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725873AbhAMKFg (ORCPT ); Wed, 13 Jan 2021 05:05:36 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R811e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=zhongjiang-ali@linux.alibaba.com;NM=1;PH=DS;RN=16;SR=0;TI=SMTPD_---0ULc.JMJ_1610532286; Received: from L-X1DSLVDL-1420.local(mailfrom:zhongjiang-ali@linux.alibaba.com fp:SMTPD_---0ULc.JMJ_1610532286) by smtp.aliyun-inc.com(127.0.0.1); Wed, 13 Jan 2021 18:04:47 +0800 Subject: Re: [PATCH 04/10] mm, fsdax: Refactor memory-failure handler for dax mapping To: Ruan Shiyang , Jan Kara Cc: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-nvdimm@lists.01.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-raid@vger.kernel.org, darrick.wong@oracle.com, dan.j.williams@intel.com, david@fromorbit.com, hch@lst.de, song@kernel.org, rgoldwyn@suse.de, qi.fuli@fujitsu.com, y-goto@fujitsu.com References: <20201230165601.845024-1-ruansy.fnst@cn.fujitsu.com> <20201230165601.845024-5-ruansy.fnst@cn.fujitsu.com> <20210106154132.GC29271@quack2.suse.cz> <75164044-bfdf-b2d6-dff0-d6a8d56d1f62@cn.fujitsu.com> From: zhong jiang Message-ID: <781f276b-afdd-091c-3dba-048e415431ab@linux.alibaba.com> Date: Wed, 13 Jan 2021 18:04:46 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:85.0) Gecko/20100101 Thunderbird/85.0 MIME-Version: 1.0 In-Reply-To: <75164044-bfdf-b2d6-dff0-d6a8d56d1f62@cn.fujitsu.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org On 2021/1/12 10:55 上午, Ruan Shiyang wrote: > > > On 2021/1/6 下午11:41, Jan Kara wrote: >> On Thu 31-12-20 00:55:55, Shiyang Ruan wrote: >>> The current memory_failure_dev_pagemap() can only handle single-mapped >>> dax page for fsdax mode.  The dax page could be mapped by multiple >>> files >>> and offsets if we let reflink feature & fsdax mode work together.  So, >>> we refactor current implementation to support handle memory failure on >>> each file and offset. >>> >>> Signed-off-by: Shiyang Ruan >> >> Overall this looks OK to me, a few comments below. >> >>> --- >>>   fs/dax.c            | 21 +++++++++++ >>>   include/linux/dax.h |  1 + >>>   include/linux/mm.h  |  9 +++++ >>>   mm/memory-failure.c | 91 >>> ++++++++++++++++++++++++++++++++++----------- >>>   4 files changed, 100 insertions(+), 22 deletions(-) > > ... > >>>   @@ -345,9 +348,12 @@ static void add_to_kill(struct task_struct >>> *tsk, struct page *p, >>>       } >>>         tk->addr = page_address_in_vma(p, vma); >>> -    if (is_zone_device_page(p)) >>> -        tk->size_shift = dev_pagemap_mapping_shift(p, vma); >>> -    else >>> +    if (is_zone_device_page(p)) { >>> +        if (is_device_fsdax_page(p)) >>> +            tk->addr = vma->vm_start + >>> +                    ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); >> >> It seems strange to use 'pgoff' for dax pages and not for any other >> page. >> Why? I'd rather pass correct pgoff from all callers of add_to_kill() and >> avoid this special casing... > > Because one fsdax page can be shared by multiple pgoffs.  I have to > pass each pgoff in each iteration to calculate the address in vma (for > tk->addr).  Other kinds of pages don't need this. They can get their > unique address by calling "page_address_in_vma()". > IMO,   an fsdax page can be shared by multiple files rather than multiple pgoffs if fs query support reflink.   Because an page only located in an mapping(page->mapping is exclusive),  hence it  only has an pgoff or index pointing at the node.  or  I miss something for the feature ?  thanks, > So, I added this fsdax case here.  This patchset only implemented the > fsdax case, other cases also need to be added here if to be implemented. > > > -- > Thanks, > Ruan Shiyang. > >> >>> +        tk->size_shift = dev_pagemap_mapping_shift(p, vma, tk->addr); >>> +    } else >>>           tk->size_shift = page_shift(compound_head(p)); >>>         /* >>> @@ -495,7 +501,7 @@ static void collect_procs_anon(struct page >>> *page, struct list_head *to_kill, >>>               if (!page_mapped_in_vma(page, vma)) >>>                   continue; >>>               if (vma->vm_mm == t->mm) >>> -                add_to_kill(t, page, vma, to_kill); >>> +                add_to_kill(t, page, NULL, 0, vma, to_kill); >>>           } >>>       } >>>       read_unlock(&tasklist_lock); >>> @@ -505,24 +511,19 @@ static void collect_procs_anon(struct page >>> *page, struct list_head *to_kill, >>>   /* >>>    * Collect processes when the error hit a file mapped page. >>>    */ >>> -static void collect_procs_file(struct page *page, struct list_head >>> *to_kill, >>> -                int force_early) >>> +static void collect_procs_file(struct page *page, struct >>> address_space *mapping, >>> +        pgoff_t pgoff, struct list_head *to_kill, int force_early) >>>   { >>>       struct vm_area_struct *vma; >>>       struct task_struct *tsk; >>> -    struct address_space *mapping = page->mapping; >>> -    pgoff_t pgoff; >>>         i_mmap_lock_read(mapping); >>>       read_lock(&tasklist_lock); >>> -    pgoff = page_to_pgoff(page); >>>       for_each_process(tsk) { >>>           struct task_struct *t = task_early_kill(tsk, force_early); >>> - >>>           if (!t) >>>               continue; >>> -        vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, >>> -                      pgoff) { >>> +        vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, >>> pgoff) { >>>               /* >>>                * Send early kill signal to tasks where a vma covers >>>                * the page but the corrupted page is not necessarily >>> @@ -531,7 +532,7 @@ static void collect_procs_file(struct page >>> *page, struct list_head *to_kill, >>>                * to be informed of all such data corruptions. >>>                */ >>>               if (vma->vm_mm == t->mm) >>> -                add_to_kill(t, page, vma, to_kill); >>> +                add_to_kill(t, page, mapping, pgoff, vma, to_kill); >>>           } >>>       } >>>       read_unlock(&tasklist_lock); >>> @@ -550,7 +551,8 @@ static void collect_procs(struct page *page, >>> struct list_head *tokill, >>>       if (PageAnon(page)) >>>           collect_procs_anon(page, tokill, force_early); >>>       else >>> -        collect_procs_file(page, tokill, force_early); >>> +        collect_procs_file(page, page->mapping, page_to_pgoff(page), >> >> Why not use page_mapping() helper here? It would be safer for THPs if >> they >> ever get here... >> >>                                 Honza >> >