From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:50598 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751154AbeCGL17 (ORCPT ); Wed, 7 Mar 2018 06:27:59 -0500 Subject: Re: [PATCH 2/2] Btrfs: fix fiemap extent SHARED flag error with range clone. To: Qu Wenruo , robbieko Cc: linux-btrfs@vger.kernel.org, linux-btrfs-owner@vger.kernel.org From: Nikolay Borisov Message-ID: <3b241323-0e2d-7428-ac91-1da3b3221c6a@suse.com> Date: Wed, 7 Mar 2018 13:27:55 +0200 MIME-Version: 1.0 In-Reply-To: <81760b06-446f-494f-4b0b-dd5e07c76b35@gmx.com> Content-Type: text/plain; charset=utf-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: References: <1520410819-32405-3-git-send-email-robbieko@synology.com> <2ff9062e-54fa-50dd-8efa-352608746bc5@gmx.com> <2f3228e8-af87-d0aa-4f35-69e89873026b@gmx.com> <25bec3b50fe42a2f816f1391a438f624@synology.com> <81760b06-446f-494f-4b0b-dd5e07c76b35@gmx.com> On 7.03.2018 13:18, Qu Wenruo wrote: > > > On 2018年03月07日 19:01, robbieko wrote: >> Qu Wenruo 於 2018-03-07 18:42 寫到: >>> On 2018年03月07日 18:33, Qu Wenruo wrote: >>>> >>>> >>>> On 2018年03月07日 16:20, robbieko wrote: >>>>> From: Robbie Ko >>>>> >>>>> [BUG] >>>>> Range clone can cause fiemap to return error result. >>>>> Like: >>>>>  # mount /dev/vdb5 /mnt/btrfs >>>>>  # dd if=/dev/zero bs=16K count=2 oflag=dsync of=/mnt/btrfs/file >>>>>  # xfs_io -c "fiemap -v" /mnt/btrfs/file >>>>>  /mnt/btrfs/file: >>>>>  EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS >>>>>    0: [0..63]:         4241424..4241487    64   0x1 >>>>> >>>>>  # cloner -s $((16*1024)) /mnt/btrfs/file /mnt/btrfs/file_clone >>>>>  # xfs_io -c "fiemap -v" /mnt/btrfs/file >>>>>  /mnt/btrfs/file: >>>>>  EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS >>>>>    0: [0..63]:         4241424..4241487    64   0x1 >>>>>  If we clone second file extent, we will get error FLAGS, >>>>>  SHARED bit is not set. >>>>> >>>>> [REASON] >>>>> Btrfs only checks if the first extent in extent map is shared, >>>>> but extent may merge. >>>>> >>>>> [FIX] >>>>> Here we will check each extent with extent map range, >>>>> if one of them is shared, extent map is shared. >>>>> >>>>> [PATCH RESULT] >>>>>  # xfs_io -c "fiemap -v" /mnt/btrfs/file >>>>>  /mnt/btrfs/file: >>>>>  EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS >>>>>    0: [0..63]:         4241424..4241487    64 0x2001 >>>>> >>>>> Signed-off-by: Robbie Ko >>>>> --- >>>>>  fs/btrfs/extent_io.c | 146 >>>>> +++++++++++++++++++++++++++++++++++++++++++++------ >>>>>  1 file changed, 131 insertions(+), 15 deletions(-) >>>>> >>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >>>>> index 066b6df..5c6dca9 100644 >>>>> --- a/fs/btrfs/extent_io.c >>>>> +++ b/fs/btrfs/extent_io.c >>>>> @@ -4394,8 +4394,8 @@ static int emit_fiemap_extent(struct >>>>> fiemap_extent_info *fieinfo, >>>>>       */ >>>>>      if (cache->offset + cache->len  == offset && >>>>>          cache->phys + cache->len == phys  && >>>>> -        (cache->flags & ~FIEMAP_EXTENT_LAST) == >>>>> -            (flags & ~FIEMAP_EXTENT_LAST)) { >>>>> +        (cache->flags & ~(FIEMAP_EXTENT_LAST|FIEMAP_EXTENT_SHARED)) == >>>>> +            (flags & ~(FIEMAP_EXTENT_LAST|FIEMAP_EXTENT_SHARED))) { >>>>>          cache->len += len; >>>>>          cache->flags |= flags; >>>>>          goto try_submit_last; >>>>> @@ -4450,6 +4450,134 @@ static int emit_last_fiemap_cache(struct >>>>> btrfs_fs_info *fs_info, >>>>>      return ret; >>>>>  } >>>>> >>>>> +/* >>>>> + * Helper to check the file range is shared. >>>>> + * >>>>> + * Fiemap extent will be combined with many extents, so we need to >>>>> examine >>>>> + * each extent, and if shared, the results are shared. >>>>> + * >>>>> + * Return: 0 if file range is not shared, 1 if it is shared, < 0 on >>>>> error. >>>>> + */ >>>>> +static int extent_map_check_shared(struct inode *inode, u64 start, >>>>> u64 end) >>>>> +{ >>>>> +    struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); >>>>> +    struct btrfs_root *root = BTRFS_I(inode)->root; >>>>> +    int ret = 0; >>>>> +    struct extent_buffer *leaf; >>>>> +    struct btrfs_path *path; >>>>> +    struct btrfs_file_extent_item *fi; >>>>> +    struct btrfs_key found_key; >>>>> +    int check_prev = 1; >>>>> +    int extent_type; >>>>> +    int shared = 0; >>>>> +    u64 cur_offset; >>>>> +    u64 extent_end; >>>>> +    u64 ino = btrfs_ino(BTRFS_I(inode)); >>>>> +    u64 disk_bytenr; >>>>> + >>>>> +    path = btrfs_alloc_path(); >>>>> +    if (!path) { >>>>> +        return -ENOMEM; >>>>> +    } >>>>> + >>>>> +    cur_offset = start; >>>>> +    while (1) { >>>>> +        ret = btrfs_lookup_file_extent(NULL, root, path, ino, >>>>> +                           cur_offset, 0); >>>>> +        if (ret < 0) >>>>> +            goto error; >>>>> +        if (ret > 0 && path->slots[0] > 0 && check_prev) { >>>>> +            leaf = path->nodes[0]; >>>>> +            btrfs_item_key_to_cpu(leaf, &found_key, >>>>> +                          path->slots[0] - 1); >>>>> +            if (found_key.objectid == ino && >>>>> +                found_key.type == BTRFS_EXTENT_DATA_KEY) >>>>> +                path->slots[0]--; >>>>> +        } >>>>> +        check_prev = 0; >>>>> +next_slot: >>>>> +        leaf = path->nodes[0]; >>>>> +        if (path->slots[0] >= btrfs_header_nritems(leaf)) { >>>>> +            ret = btrfs_next_leaf(root, path); >>>>> +            if (ret < 0) >>>>> +                goto error; >>>>> +            if (ret > 0) >>>>> +                break; >>>>> +            leaf = path->nodes[0]; >>>>> +        } >>>>> + >>>>> +        disk_bytenr = 0; >>>>> +        btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]); >>>>> + >>>>> +        if (found_key.objectid > ino) >>>>> +            break; >>>>> +        if (WARN_ON_ONCE(found_key.objectid < ino) || >>>>> +            found_key.type < BTRFS_EXTENT_DATA_KEY) { >>>>> +            path->slots[0]++; >>>>> +            goto next_slot; >>>>> +        } >>>>> +        if (found_key.type > BTRFS_EXTENT_DATA_KEY || >>>>> +            found_key.offset > end) >>>>> +            break; >>>>> + >>>>> +        fi = btrfs_item_ptr(leaf, path->slots[0], >>>>> +                    struct btrfs_file_extent_item); >>>>> +        extent_type = btrfs_file_extent_type(leaf, fi); >>>>> + >>>>> +        if (extent_type == BTRFS_FILE_EXTENT_REG || >>>>> +            extent_type == BTRFS_FILE_EXTENT_PREALLOC) { >>>>> +            disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi); >>>>> +            extent_end = found_key.offset + >>>>> +                btrfs_file_extent_num_bytes(leaf, fi); >>>>> +            if (extent_end <= start) { >>>>> +                path->slots[0]++; >>>>> +                goto next_slot; >>>>> +            } >>>>> +            if (disk_bytenr == 0) { >>>>> +                path->slots[0]++; >>>>> +                goto next_slot; >>>>> +            } >>>>> + >>>>> +            btrfs_release_path(path); >>>>> + >>>>> +            /* >>>>> +             * As btrfs supports shared space, this information >>>>> +             * can be exported to userspace tools via >>>>> +             * flag FIEMAP_EXTENT_SHARED.  If fi_extents_max == 0 >>>>> +             * then we're just getting a count and we can skip the >>>>> +             * lookup stuff. >>>>> +             */ >>>>> +            ret = btrfs_check_shared(root, >>>>> +                         ino, disk_bytenr); >>>>> +            if (ret < 0) >>>>> +                goto error; >>>>> +            if (ret) >>>>> +                shared = 1; >>>>> +            ret = 0; >>>>> +            if (shared) { >>>>> +                break; >>>>> +            } >>>>> +        } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) { >>>>> +            extent_end = found_key.offset + >>>>> +                btrfs_file_extent_inline_len(leaf, >>>>> +                             path->slots[0], fi); >>>>> +            extent_end = ALIGN(extent_end, fs_info->sectorsize); >>>>> +            path->slots[0]++; >>>>> +            goto next_slot; >>>>> +        } else { >>>>> +            BUG_ON(1); >>>>> +        } >>>>> +        cur_offset = extent_end; >>>>> +        if (cur_offset > end) >>>>> +            break; >>>>> +    } >>>>> + >>>>> +    ret = 0; >>>>> +error: >>>>> +    btrfs_free_path(path); >>>>> +    return !ret ? shared : ret; >>>>> +} >>>>> + >>>>>  int extent_fiemap(struct inode *inode, struct fiemap_extent_info >>>>> *fieinfo, >>>>>          __u64 start, __u64 len, get_extent_t *get_extent) >>>>>  { >>>>> @@ -4587,19 +4715,7 @@ int extent_fiemap(struct inode *inode, struct >>>>> fiemap_extent_info *fieinfo, >>>>>              flags |= (FIEMAP_EXTENT_DELALLOC | >>>>>                    FIEMAP_EXTENT_UNKNOWN); >>>>>          } else if (fieinfo->fi_extents_max) { >>>>> -            u64 bytenr = em->block_start - >>>>> -                (em->start - em->orig_start); >>>>> - >>>>> -            /* >>>>> -             * As btrfs supports shared space, this information >>>>> -             * can be exported to userspace tools via >>>>> -             * flag FIEMAP_EXTENT_SHARED.  If fi_extents_max == 0 >>>>> -             * then we're just getting a count and we can skip the >>>>> -             * lookup stuff. >>>>> -             */ >>>>> -            ret = btrfs_check_shared(root, >>>>> -                         btrfs_ino(BTRFS_I(inode)), >>>>> -                         bytenr); >>>> >>>> Since we're going to use the whole extent to determine the SHARED flags, >>>> what about just passing the extent bytenr into btrfs_check_shared? >>>> >>>> In that case I think we could get correct shared flag without using >>>> another helper function. >>>> (IIRC it's em->block_start) >>> >>> Well, it's not em->block_start, but a little more complex. >>> (For compressed one it's em->block_start, but for plaintext one, it's >>> more complex) >>> >>> em->block_start = file_extent_disk_bytenr + >>> file_extent_file_extent_offset >>> >>> We need extra calculation to determine the real extent bytenr >>> (file_extent_disk_bytenr). >>> >>> IIRC the correct calculation would be: >>> file_extent_disk_bytenr = em->block_start - em->start + em->orig_start. >>> >>> (Who thought out such anti-human calculation for extent map?!) >>> >>> Thanks, >>> Qu >> >> dd if=/dev/zero bs=16K count=4 oflag=dsync of=file >> >> btrfs-debugfs -f file >> (276 0): ram 16384 disk 2171609088 disk_size 16384 >> (276 16384): ram 16384 disk 2171625472 disk_size 16384 >> (276 32768): ram 16384 disk 2171641856 disk_size 16384 >> (276 49152): ram 16384 disk 2171658240 disk_size 16384 >> file: file extents 4 disk size 65536 logical size 65536 ratio 1.00 >> >> xfs_io -c "fiemap -v" file >> file: >>  EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS >>    0: [0..127]:        4241424..4241551   128   0x1 >> >> My point is that we have only one extent_map, >> but in fact it can correspond to many extents >> and each one has the potential to be cloned >> so we need to examine each one individually. > > Right, I missed this point. > > SHARED flag is determined after extent map merge, so here we can't rely > on em here. Shouldn't extent maps correspond to 1:1 disk-state. I.e. they are just the memory cache of the extent state. So if we merge them, shouldn't we also merge the on-disk extents as well ? > > So the current helper function looks valid, unless we have method to > access the original extent map. > > Thanks, > Qu > >> >> >>>> >>>> Thanks, >>>> Qu >>>> >>>>> +            ret = extent_map_check_shared(inode, em->start, >>>>> extent_map_end(em) - 1); >>>>>              if (ret < 0) >>>>>                  goto out_free; >>>>>              if (ret) >>>>> -- >>>>> 1.9.1 >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe >>>>> linux-btrfs" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html >>>>> >>>> >> >