LKML Archive on lore.kernel.org
 help / color / Atom feed
From: David Sterba <dsterba@suse.com>
To: torvalds@linux-foundation.org
Cc: David Sterba <dsterba@suse.com>,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [GIT PULL] Btrfs updates for 5.12
Date: Tue, 16 Feb 2021 13:56:46 +0100
Message-ID: <cover.1613417746.git.dsterba@suse.com> (raw)

Hi,

this update brings updates of space handling, performance improvements
or bug fixes. The subpage block size and zoned mode features have
reached state where they're usable but with limitations.

The branch merges cleanly on top of current master, there are some minor
conflicts reported by linux-next in iov_iter or kmap conversion.

Please pull, thanks.

- Performance or related

  - do not block on deleted block group mutex in the cleaner, avoids
    some long stalls

  - improved flushing: make it work better with ticket space
    reservations and avoid excessive transaction commits in some
    scenarios, slightly improves throughput for random write load

  - preemptive background flushing: separate the logic from ticket
    reservations, improve the accounting and decisions when to flush in
    low space conditions

  - less lock contention related to running delayed refs, let just one
    thread do the flushing when there are many inside transaction commit

  - dbench workload improvements: avoid unnecessary work when logging
    inodes, fewer fallbacks to transaction commit and thus less waiting
    for it (+7% throughput, -20% latency)

- Core

  - subpage block size
    - currently read-only support
    - refactor and generalize code where sectorsize is assumed to be
      page size, add the subpage handling everywhere
    - the read-write support is on the way, page sizes are still limited
      to 4K or 64K

  - zoned mode, first working version but with limitations
    - SMR/ZBC/ZNS friendly allocation mode, utilizing the "no fixed
      location for structures" and chunked allocation
    - superblock as the only fixed data structure needs special
      handling, uses 2 consecutive zones as a ring buffer
    - tree-log support with a dedicated block group to avoid unordered
      writes
    - emulated zones on non-zoned devices
    - not yet working
      - all non-single block group profiles, requires more
	zone write pointer synchronization between the multiple block
	groups
      - fitrim due to dependency on space cache, can be implemented

- Fixes

  - ref-verify: proper tree owner and node level tracking
  - fix pinned byte accounting, causing some early ENOSPC now more
    likely due to other changes in delayed refs

- Other

  - error handling fixes and improvements
  - more error injection points
  - more function documentation
  - more and updated tracepoints
  - subset of W=1 checked by default
  - update comments to allow more automatic kdoc parameter checks

----------------------------------------------------------------
The following changes since commit e0756cfc7d7cd08c98a53b6009c091a3f6a50be6:

  Merge tag 'trace-v5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace (2021-02-08 11:32:39 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.12-tag

for you to fetch changes up to 9d294a685fbcb256ce8c5f7fd88a7596d0f52a8a:

  btrfs: zoned: enable to mount ZONED incompat flag (2021-02-09 02:52:24 +0100)

----------------------------------------------------------------
Abaci Team (1):
      btrfs: simplify condition in __btrfs_run_delayed_items

Filipe Manana (10):
      btrfs: send: remove stale code when checking for shared extents
      btrfs: remove wrong comment for can_nocow_extent()
      btrfs: remove unnecessary directory inode item update when deleting dir entry
      btrfs: stop setting nbytes when filling inode item for logging
      btrfs: avoid logging new ancestor inodes when logging new inode
      btrfs: skip logging directories already logged when logging all parents
      btrfs: skip logging inodes already logged when logging new entries
      btrfs: remove unnecessary check_parent_dirs_for_sync()
      btrfs: make concurrent fsyncs wait less when waiting for a transaction commit
      btrfs: fix extent buffer leak on failure to copy root

Johannes Thumshirn (7):
      block: add bio_add_zone_append_page
      btrfs: release path before calling to btrfs_load_block_group_zone_info
      btrfs: zoned: do not load fs_info::zoned from incompat flag
      btrfs: zoned: allow zoned filesystems on non-zoned block devices
      btrfs: zoned: check if bio spans across an ordered extent
      btrfs: zoned: cache if block group is on a sequential zone
      btrfs: save irq flags when looking up an ordered extent

Josef Bacik (34):
      btrfs: fix error handling in commit_fs_roots
      btrfs: allow error injection for btrfs_search_slot and btrfs_cow_block
      btrfs: noinline btrfs_should_cancel_balance
      btrfs: ref-verify: pass down tree block level when building refs
      btrfs: ref-verify: make sure owner is set for all refs
      btrfs: keep track of the root owner for relocation reads
      btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node
      btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself
      btrfs: account for new extents being deleted in total_bytes_pinned
      btrfs: fix reloc root leak with 0 ref reloc roots on recovery
      btrfs: splice remaining dirty_bg's onto the transaction dirty bg list
      btrfs: do not warn if we can't find the reloc root when looking up backref
      btrfs: add asserts for deleting backref cache nodes
      btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
      btrfs: do not block on deleted bgs mutex in the cleaner
      btrfs: only let one thread pre-flush delayed refs in commit
      btrfs: delayed refs pre-flushing should only run the heads we have
      btrfs: only run delayed refs once before committing
      btrfs: move delayed ref flushing for qgroup into qgroup helper
      btrfs: remove bogus BUG_ON in alloc_reserved_tree_block
      btrfs: stop running all delayed refs during snapshot
      btrfs: run delayed refs less often in commit_cowonly_roots
      btrfs: make flush_space take a enum btrfs_flush_state instead of int
      btrfs: add a trace point for reserve tickets
      btrfs: track ordered bytes instead of just dio ordered bytes
      btrfs: introduce a FORCE_COMMIT_TRANS flush operation
      btrfs: improve preemptive background space flushing
      btrfs: rename need_do_async_reclaim
      btrfs: check reclaim_size in need_preemptive_reclaim
      btrfs: rework btrfs_calc_reclaim_metadata_size
      btrfs: simplify the logic in need_preemptive_flushing
      btrfs: implement space clamping for preemptive flushing
      btrfs: adjust the flush trace point to include the source
      btrfs: add a trace class for dumping the current ENOSPC state

Michal Rostecki (1):
      btrfs: let callers of btrfs_get_io_geometry pass the em

Naohiro Aota (36):
      iomap: support REQ_OP_ZONE_APPEND
      btrfs: zoned: defer loading zone info after opening trees
      btrfs: zoned: use regular super block location on zone emulation
      btrfs: zoned: disallow fitrim on zoned filesystems
      btrfs: zoned: implement zoned chunk allocator
      btrfs: zoned: verify device extent is aligned to zone
      btrfs: zoned: load zone's allocation offset
      btrfs: zoned: calculate allocation offset for conventional zones
      btrfs: zoned: track unusable bytes for zones
      btrfs: zoned: implement sequential extent allocation
      btrfs: zoned: redirty released extent buffers
      btrfs: zoned: advance allocation pointer after tree log node
      btrfs: zoned: reset zones of unused block groups
      btrfs: factor out helper adding a page to bio
      btrfs: zoned: use bio_add_zone_append_page
      btrfs: zoned: handle REQ_OP_ZONE_APPEND as writing
      btrfs: zoned: split ordered extent when bio is sent
      btrfs: extend btrfs_rmap_block for specifying a device
      btrfs: zoned: use ZONE_APPEND write for zoned mode
      btrfs: zoned: enable zone append writing for direct IO
      btrfs: zoned: introduce dedicated data write path for zoned filesystems
      btrfs: zoned: serialize metadata IO
      btrfs: zoned: wait for existing extents before truncating
      btrfs: zoned: do not use async metadata checksum on zoned filesystems
      btrfs: zoned: mark block groups to copy for device-replace
      btrfs: zoned: implement cloning for zoned device-replace
      btrfs: zoned: implement copying for zoned device-replace
      btrfs: zoned: support dev-replace in zoned filesystems
      btrfs: zoned: enable relocation on a zoned filesystem
      btrfs: zoned: relocate block group to repair IO failure in zoned filesystems
      btrfs: split alloc_log_tree()
      btrfs: zoned: extend zoned allocator to use dedicated tree-log block group
      btrfs: zoned: serialize log transaction on zoned filesystems
      btrfs: zoned: reorder log node allocation on zoned filesystem
      btrfs: zoned: deal with holes writing out tree-log pages
      btrfs: zoned: enable to mount ZONED incompat flag

Nigel Christian (1):
      btrfs: remove repeated word in struct member comment

Nikolay Borisov (24):
      btrfs: cleanup local variables in btrfs_file_write_iter
      btrfs: rename btrfs_find_highest_objectid to btrfs_init_root_free_objectid
      btrfs: rename btrfs_find_free_objectid to btrfs_get_free_objectid
      btrfs: rename btrfs_root::highest_objectid to free_objectid
      btrfs: make btrfs_root::free_objectid hold the next available objectid
      btrfs: remove new_dirid argument from btrfs_create_subvol_root
      btrfs: consolidate btrfs_previous_item ret val handling in btrfs_shrink_device
      btrfs: make btrfs_start_delalloc_root's nr argument a long
      btrfs: remove always true condition in btrfs_start_delalloc_roots
      btrfs: document modified parameter of add_extent_mapping
      btrfs: fix parameter description of btrfs_add_extent_mapping
      btrfs: fix function description formats in file-item.c
      btrfs: fix parameter description in delayed-ref.c functions
      btrfs: improve parameter description for __btrfs_write_out_cache
      btrfs: document now parameter of peek_discard_list
      btrfs: document fs_info in btrfs_rmap_block
      btrfs: fix description format of fs_info of btrfs_wait_on_delayed_iputs
      btrfs: document btrfs_check_shared parameters
      btrfs: fix parameter description of btrfs_inode_rsv_release/btrfs_delalloc_release_space
      btrfs: fix parameter description in space-info.c
      btrfs: fix parameter description for functions in extent_io.c
      btrfs: zoned: remove unused variable in btrfs_sb_log_location_bdev
      lib/zstd: convert constants to defines
      btrfs: enable W=1 checks for btrfs

Qu Wenruo (27):
      btrfs: make btrfs_dio_private::bytes u32
      btrfs: refactor btrfs_dec_test_* functions for ordered extents
      btrfs: rename parameter offset to disk_bytenr in submit_extent_page
      btrfs: refactor __extent_writepage_io() to improve readability
      btrfs: update comment for btrfs_dirty_pages
      btrfs: introduce helper to grab an existing extent buffer from a page
      btrfs: rework the order of btrfs_ordered_extent::flags
      btrfs: fix double accounting of ordered extent for subpage case in btrfs_invalidapge
      btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to PAGE_START_WRITEBACK
      btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for subpage support
      btrfs: introduce the skeleton of btrfs_subpage structure
      btrfs: make attach_extent_buffer_page() handle subpage case
      btrfs: make grab_extent_buffer_from_page() handle subpage case
      btrfs: support subpage for extent buffer page release
      btrfs: attach private to dummy extent buffer pages
      btrfs: introduce helpers for subpage uptodate status
      btrfs: introduce helpers for subpage error status
      btrfs: support subpage in set/clear_extent_buffer_uptodate()
      btrfs: support subpage in btrfs_clone_extent_buffer
      btrfs: support subpage in try_release_extent_buffer()
      btrfs: introduce read_extent_buffer_subpage()
      btrfs: support subpage in endio_readpage_update_page_status()
      btrfs: introduce subpage metadata validation check
      btrfs: introduce btrfs_subpage for data inodes
      btrfs: integrate page status update for data read path into begin/end_page_read
      btrfs: allow read-only mount of 4K sector size fs on 64K page system
      btrfs: explain page locking and readahead in read_extent_buffer_pages()

Roman Anasal (1):
      btrfs: send: use struct send_ctx *sctx for btrfs_compare_trees and changed_cb

Yang Li (1):
      btrfs: remove redundant NULL check before kvfree

Zhihao Cheng (1):
      btrfs: clarify error returns values in __load_free_space_cache

 block/bio.c                       |  33 ++
 fs/btrfs/Makefile                 |  19 +-
 fs/btrfs/backref.c                |  17 +-
 fs/btrfs/backref.h                |   9 +-
 fs/btrfs/block-group.c            | 178 +++++---
 fs/btrfs/block-group.h            |  21 +-
 fs/btrfs/btrfs_inode.h            |   3 +-
 fs/btrfs/compression.c            |  10 +-
 fs/btrfs/ctree.c                  |   9 +-
 fs/btrfs/ctree.h                  |  19 +-
 fs/btrfs/delalloc-space.c         |  29 +-
 fs/btrfs/delayed-inode.c          |   2 +-
 fs/btrfs/delayed-ref.c            |  79 ++--
 fs/btrfs/delayed-ref.h            |  28 +-
 fs/btrfs/dev-replace.c            | 186 +++++++-
 fs/btrfs/dev-replace.h            |   3 +
 fs/btrfs/discard.c                |   6 +-
 fs/btrfs/disk-io.c                | 183 ++++++--
 fs/btrfs/disk-io.h                |   6 +-
 fs/btrfs/extent-tree.c            | 361 ++++++++++------
 fs/btrfs/extent_io.c              | 791 +++++++++++++++++++++++++++-------
 fs/btrfs/extent_io.h              |  17 +-
 fs/btrfs/extent_map.c             |  18 +-
 fs/btrfs/file-item.c              |  22 +-
 fs/btrfs/file.c                   |  58 ++-
 fs/btrfs/free-space-cache.c       | 123 +++++-
 fs/btrfs/free-space-cache.h       |   2 +
 fs/btrfs/inode.c                  | 336 ++++++++++++---
 fs/btrfs/ioctl.c                  |  29 +-
 fs/btrfs/ordered-data.c           | 224 +++++++---
 fs/btrfs/ordered-data.h           |  57 ++-
 fs/btrfs/raid56.c                 |   3 +-
 fs/btrfs/ref-verify.c             |  43 +-
 fs/btrfs/reflink.c                |   5 +-
 fs/btrfs/relocation.c             |  99 ++++-
 fs/btrfs/scrub.c                  | 143 +++++++
 fs/btrfs/send.c                   |  31 +-
 fs/btrfs/space-info.c             | 365 ++++++++++++----
 fs/btrfs/space-info.h             |  25 +-
 fs/btrfs/subpage.c                | 278 ++++++++++++
 fs/btrfs/subpage.h                |  91 ++++
 fs/btrfs/super.c                  |   8 +-
 fs/btrfs/sysfs.c                  |   2 +
 fs/btrfs/tests/extent-map-tests.c |   2 +-
 fs/btrfs/transaction.c            | 152 ++++---
 fs/btrfs/transaction.h            |   5 +
 fs/btrfs/tree-log.c               | 288 ++++++-------
 fs/btrfs/volumes.c                | 364 +++++++++++++---
 fs/btrfs/volumes.h                |   8 +-
 fs/btrfs/zoned.c                  | 873 +++++++++++++++++++++++++++++++++++++-
 fs/btrfs/zoned.h                  | 157 ++++++-
 fs/iomap/direct-io.c              |  43 +-
 include/linux/bio.h               |   2 +
 include/linux/iomap.h             |   1 +
 include/linux/zstd.h              |   8 +-
 include/trace/events/btrfs.h      | 111 ++++-
 56 files changed, 4874 insertions(+), 1111 deletions(-)
 create mode 100644 fs/btrfs/subpage.c
 create mode 100644 fs/btrfs/subpage.h

             reply index

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-16 12:56 David Sterba [this message]
2021-02-21 18:39 ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1613417746.git.dsterba@suse.com \
    --to=dsterba@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git