linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD
@ 2023-02-25  1:08 Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 01/76] ssdfs: introduce SSDFS on-disk layout Viacheslav Dubeyko
                   ` (76 more replies)
  0 siblings, 77 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

Hello,

I am completely aware that patchset is big. And I am opened for any
advices how I can split the patchset on reasonable portions with
the goal to introduce SSDFS for the review. Even now, I excluded
the code of several subsystems to make the patchset slightly
smaller. Potentially, I can introduce SSDFS by smaller portions
with limited fucntionality. However, it can confuse and makes it
hard to understand how declared goals are achieved by implemented
functionality. SSDFS is still not completely stable but I believe that
it's time to hear the community opinion.

[PROBLEM DECLARATION]

SSD is a sophisticated device capable of managing in-place
updates. However, in-place updates generate significant FTL GC
responsibilities that increase write amplification factor, require
substantial NAND flash overprovisioning, decrease SSD lifetime,
and introduce performance spikes. Log-structured File System (LFS)
approach can introduce a more flash-friendly Copy-On-Write (COW) model.
However, F2FS and NILFS2 issue in-place updates anyway, even by using
the COW policy for main volume area. Also, GC is an inevitable subsystem
of any LFS file system that introduces write amplification, retention
issue, excessive copy operations, and performance degradation for
aged volume. Generally speaking, available file system technologies
have side effects: (1) write amplification issue, (2) significant FTL GC
responsibilities, (3) inevitable FS GC overhead, (4) read disturbance,
(5) retention issue. As a result, SSD lifetime reduction, perfomance
degradation, early SSD failure, and increased TCO cost are reality of
data infrastructure.

[WHY YET ANOTHER FS?]

ZNS SSD is a good vehicle that can help to manage a subset of known
issues by means of introducing a strict append-only mode of operations.
However, for example, F2FS has an in-place update metadata area that
can be placed into conventional zone and, anyway, introduces FTL GC
responsibilities even for ZNS SSD case. Also, limited number of
open/active zones (for example, 14 open/active zones) creates really
complicated requirements that not every file system architecure can
satisfy. It means that architecture of multiple file systems has
peculiarities compromising the ZNS SSD model. Moreover, FS GC overhead is
still a critical problem for LFS file systems (F2FS, NILFS2, for example),
even for the case of ZNS SSD.

Generally speaking, it will be good to see an LFS file system architecture
that is capable:
(1) eliminate FS GC overhead,
(2) decrease/eliminate FTL GC responsibilities,
(3) decrease write amplification factor,
(4) introduce native architectural support of ZNS SSD + SMR HDD,
(5) increase compression ratio by using delta-encoding and deduplication,
(6) introduce smart management of "cold" data and efficient TRIM policy,
(7) employ parallelism of multiple NAND dies/channels,
(8) prolong SSD lifetime and decrease TCO cost,
(9) guarantee strong reliability and capability to reconstruct heavily
    corrupted file system volume,
(10) guarantee stable performance.

[SSDFS DESIGN GOALS]

SSDFS is an open-source, kernel-space LFS file system designed:
(1) eliminate GC overhead, (2) prolong SSD lifetime, (3) natively support
a strict append-only mode (ZNS SSD + SMR HDD compatible), (4) guarantee
strong reliability, (5) guarantee stable performance.

[SSDFS ARCHITECTURE]

One of the key goals of SSDFS is to decrease the write amplification
factor. Logical extent concept is the fundamental technique to achieve
the goal. Logical extent describes any volume extent on the basis of
{segment ID, logical block ID, and length}. Segment is a portion of
file system volume that has to be aligned on erase block size and
always located at the same offset. It is basic unit to allocate and
to manage free space of file system volume. Every segment can include
one or several Logical Erase Blocks (LEB). LEB can be mapped into
"Physical" Erase Block (PEB). Generally speaking, PEB is fixed-sized
container that includes a number of logical blocks (physical sectors
or NAND flash pages). SSDFS is pure Log-structured File System (LFS).
It means that any write operation into erase block is the creation of
log. Content of every erase block is a sequence of logs. PEB has block
bitmap with the goal of tracking the state (free, pre-allocated,
allocated, invalid) of logical blocks and to account the physical space
is used for storing log's metadata (segment header, partial log header,
footer). Also, log contains an offset translation table that converts
logical block ID into particular offset inside of log's payload.
Log concept implements a support of compression, delta-encoding,
and compaction scheme. As a result, it provides the way: (1) decrease
write amplification, (2) decrease FTL GC responsibilities, (3) improve
compression ration and decrease payload size. Finally, SSD lifetime
can be longer and write I/O performance can be improved.

SSDFS file system is based on the concept of logical segment that
is the aggregation of Logical Erase Blocks (LEB). Moreover, initially,
LEB hasn’t association with a particular "Physical" Erase Block (PEB).
It means that segment could have the association not for all LEBs or,
even, to have no association at all with any PEB (for example, in the
case of clean segment). Generally speaking, SSDFS file system needs
a special metadata structure (PEB mapping table) that is capable of
associating any LEB with any PEB. The PEB mapping table is the crucial
metadata structure that has several goals: (1) mapping LEB to PEB,
(2) implementation of the logical extent concept, (3) implementation of
the concept of PEB migration, (4) implementation of the delayed erase
operation by specialized thread.

SSDFS implements a migration scheme. Migration scheme is a fundamental
technique of GC overhead management. The key responsibility of the
migration scheme is to guarantee the presence of data in the same segment
for any update operations. Generally speaking, the migration scheme’s model
is implemented on the basis of association an exhausted "Physical" Erase
Block (PEB) with a clean one. The goal of such association of two PEBs is
to implement the gradual migration of data by means of the update
operations in the initial (exhausted) PEB. As a result, the old, exhausted
PEB becomes invalidated after complete data migration and it will be
possible to apply the erase operation to convert it to a clean state.
The migration scheme is capable of decreasing GC activity significantly
by means of excluding the necessity to update metadata and by means of
self-migration of data between PEBs is triggered by regular update
operations. Finally, migration scheme can: (1) eliminate GC overhead,
(2) implement efficient TRIM policy, (3) prolong SDD lifetime,
(4) guarantee stable performance.

Generally speaking, SSDFS doesn't need a classical model of garbage
collection that is used in NILFS2 or F2FS. However, SSDFS has several
global GC threads (dirty, pre-dirty, used, using segment states) and
segment bitmap. The main responsibility of global GC threads is:
(1) find segment in a particular state, (2) check that segment object
is constructed and initialized by file system driver logic,
(3) check the necessity to stimulate or finish the migration
(if segment is under update operations or has update operations
recently, then migration stimulation is not necessary),
(4) define valid blocks that require migration, (5) add recommended
migration request to PEB update queue, (6) destroy in-core segment
object if no migration is necessary and no create/update requests
have been received by segment object recently. Global GC threads are
used to recommend migration stimulation for particular PEBs and
to destroy in-core segment objects that have no requests for
processing. Segment bitmap is the critical metadata structure of
SSDFS file system that implements several goals: (1) searching for
a candidate for a current segment capable of storing new data,
(2) searching by GC subsystem for the most optimal segment (dirty
state, for example) with the goal of preparing the segment in
background for storing new data (converting in a clean state).

SSDFS file system uses b-tree architecture for metadata representation
(for example, inodes tree, extents tree, dentries tree, xattr tree)
because it provides the compact way of reserving the metadata space
without the necessity to use the excessive overprovisioning of
metadata reservation (for example, in the case of plain table or array).
SSDFS file system uses a hybrid b-tree architecture with the goal
to eliminate the index nodes’ side effect. The hybrid b-tree operates by
three node types: (1) index node, (2) hybrid node, (3) leaf node.
Generally speaking, the peculiarity of hybrid node is the mixture
as index as data records into one node. Hybrid b-tree starts with
root node that is capable to keep the two index records or two data
records inline (if size of data record is equal or lesser than size
of index record). If the b-tree needs to contain more than two items
then it should be added the first hybrid node into the b-tree.
The root level of b-tree is able to contain only two nodes because
the root node is capable to store only two index records. Generally speaking,
the initial goal of hybrid node is to store the data records in
the presence of reserved index area. B-tree implements compact and
flexible metadata structure that can decrease payload size and
isolate hot, warm, and cold metadata types in different erase blocks.

Migration scheme is completely enough for the case of conventional SSDs
as for metadata as for user data. But ZNS SSD has huge zone size and
limited number of active/open zones. As a result, it requires introducing
a moving scheme for user data in the case of ZNS SSD. Finally, migration
scheme works for metadata and moving scheme works for user data
(ZNS SSD case). Initially, user data can be stored into current user
data segment/zone. And user data can be updated at the same zone until
exhaustion. Next, moving scheme starts to work. Updated user data is moved
into current user data zone for updates. As a result, it needs to update
the extents tree and to store invalidated extents of old zone into
invalidated extents tree. Invalidated extents tree needs to track
the moment when the old zone is completely invalidated and is ready
to be erased.

[BENCHMARKING]

Benchmarking results show that SSDFS is capable:
(1) generate smaller amount of write I/O requests compared with:
    1.4x - 116x (ext4),
    14x - 42x (xfs),
    6.2x - 9.8x (btrfs),
    1.5x - 41x (f2fs),
    0.6x - 22x (nilfs2);
(2) create smaller payload compared with:
    0.3x - 300x (ext4),
    0.3x - 190x (xfs),
    0.7x - 400x (btrfs),
    1.2x - 400x (f2fs),
    0.9x - 190x (nilfs2);
(3) decrease the write amplification factor compared with:
    1.3x - 116x (ext4),
    14x - 42x (xfs),
    6x - 9x (btrfs),
    1.5x - 50x (f2fs),
    1.2x - 20x (nilfs2);
(4) prolong SSD lifetime compared with:
    1.4x - 7.8x (ext4),
    15x - 60x (xfs),
    6x - 12x (btrfs),
    1.5x - 7x (f2fs),
    1x - 4.6x (nilfs2).

[CURRENT ISSUES]

Patchset does not include:
(1) shared dictionary functionality (implemented),
(2) deduplication functionality (partially implemented),
(3) snapshot support functionality (partially implemented),
(4) extended attributes support (implemented),
(5) internal unit-tests functionality (implemented),
(6) IOCTLs support (implemented).

SSDFS code still has bugs and is not fully stable yet:
(1) ZNS support is not fully stable;
(2) b-tree operations have issues for some use-cases;
(3) Support of 8K, 16K, 32K logical blocks has critical bugs;
(4) Support of multiple PEBs in segment is not stable yet;
(5) Delta-encoding support is not stable;
(6) The fsck and recoverfs tools are not fully implemented yet;
(7) Currently, offset translation table functionality introduces
    performance degradation for read I/O patch (patch with the fix is
    under testing).

[REFERENCES]
[1] SSDFS tools: https://github.com/dubeyko/ssdfs-tools.git
[2] SSDFS driver: https://github.com/dubeyko/ssdfs-driver.git
[3] Linux kernel with SSDFS support: https://github.com/dubeyko/linux.git
[4] SSDFS (paper): https://arxiv.org/abs/1907.11825
[5] Linux Plumbers 2022: https://www.youtube.com/watch?v=sBGddJBHsIo

Viacheslav Dubeyko (76):
  ssdfs: introduce SSDFS on-disk layout
  ssdfs: key file system declarations
  ssdfs: implement raw device operations
  ssdfs: implement super operations
  ssdfs: implement commit superblock operation
  ssdfs: segment header + log footer operations
  ssdfs: basic mount logic implementation
  ssdfs: search last actual superblock
  ssdfs: internal array/sequence primitives
  ssdfs: introduce PEB's block bitmap
  ssdfs: block bitmap search operations implementation
  ssdfs: block bitmap modification operations implementation
  ssdfs: introduce PEB block bitmap
  ssdfs: PEB block bitmap modification operations
  ssdfs: introduce segment block bitmap
  ssdfs: introduce segment request queue
  ssdfs: introduce offset translation table
  ssdfs: flush offset translation table
  ssdfs: offset translation table API implementation
  ssdfs: introduce PEB object
  ssdfs: introduce PEB container
  ssdfs: create/destroy PEB container
  ssdfs: PEB container API implementation
  ssdfs: PEB read thread's init logic
  ssdfs: block bitmap initialization logic
  ssdfs: offset translation table initialization logic
  ssdfs: read/readahead logic of PEB's thread
  ssdfs: PEB flush thread's finite state machine
  ssdfs: commit log logic
  ssdfs: commit log payload
  ssdfs: process update request
  ssdfs: process create request
  ssdfs: create log logic
  ssdfs: auxilairy GC threads logic
  ssdfs: introduce segment object
  ssdfs: segment object's add data/metadata operations
  ssdfs: segment object's update/invalidate data/metadata
  ssdfs: introduce PEB mapping table
  ssdfs: flush PEB mapping table
  ssdfs: convert/map LEB to PEB functionality
  ssdfs: support migration scheme by PEB state
  ssdfs: PEB mapping table thread logic
  ssdfs: introduce PEB mapping table cache
  ssdfs: PEB mapping table cache's modification operations
  ssdfs: introduce segment bitmap
  ssdfs: segment bitmap API implementation
  ssdfs: introduce b-tree object
  ssdfs: add/delete b-tree node
  ssdfs: b-tree API implementation
  ssdfs: introduce b-tree node object
  ssdfs: flush b-tree node object
  ssdfs: b-tree node index operations
  ssdfs: search/allocate/insert b-tree node operations
  ssdfs: change/delete b-tree node operations
  ssdfs: range operations of b-tree node
  ssdfs: introduce b-tree hierarchy object
  ssdfs: check b-tree hierarchy for add operation
  ssdfs: check b-tree hierarchy for update/delete operation
  ssdfs: execute b-tree hierarchy modification
  ssdfs: introduce inodes b-tree
  ssdfs: inodes b-tree node operations
  ssdfs: introduce dentries b-tree
  ssdfs: dentries b-tree specialized operations
  ssdfs: dentries b-tree node's specialized operations
  ssdfs: introduce extents queue object
  ssdfs: introduce extents b-tree
  ssdfs: extents b-tree specialized operations
  ssdfs: search extent logic in extents b-tree node
  ssdfs: add/change/delete extent in extents b-tree node
  ssdfs: introduce invalidated extents b-tree
  ssdfs: find item in invalidated extents b-tree
  ssdfs: modification operations of invalidated extents b-tree
  ssdfs: implement inode operations support
  ssdfs: implement directory operations support
  ssdfs: implement file operations support
  introduce SSDFS file system

 fs/Kconfig                          |     1 +
 fs/Makefile                         |     1 +
 fs/ssdfs/Kconfig                    |   300 +
 fs/ssdfs/Makefile                   |    50 +
 fs/ssdfs/block_bitmap.c             |  5313 ++++++++
 fs/ssdfs/block_bitmap.h             |   370 +
 fs/ssdfs/block_bitmap_tables.c      |   310 +
 fs/ssdfs/btree.c                    |  7787 ++++++++++++
 fs/ssdfs/btree.h                    |   218 +
 fs/ssdfs/btree_hierarchy.c          |  9420 ++++++++++++++
 fs/ssdfs/btree_hierarchy.h          |   284 +
 fs/ssdfs/btree_node.c               | 16928 ++++++++++++++++++++++++++
 fs/ssdfs/btree_node.h               |   768 ++
 fs/ssdfs/btree_search.c             |   885 ++
 fs/ssdfs/btree_search.h             |   359 +
 fs/ssdfs/compr_lzo.c                |   256 +
 fs/ssdfs/compr_zlib.c               |   359 +
 fs/ssdfs/compression.c              |   548 +
 fs/ssdfs/compression.h              |   104 +
 fs/ssdfs/current_segment.c          |   682 ++
 fs/ssdfs/current_segment.h          |    76 +
 fs/ssdfs/dentries_tree.c            |  9726 +++++++++++++++
 fs/ssdfs/dentries_tree.h            |   156 +
 fs/ssdfs/dev_bdev.c                 |  1187 ++
 fs/ssdfs/dev_mtd.c                  |   641 +
 fs/ssdfs/dev_zns.c                  |  1281 ++
 fs/ssdfs/dir.c                      |  2071 ++++
 fs/ssdfs/dynamic_array.c            |   781 ++
 fs/ssdfs/dynamic_array.h            |    96 +
 fs/ssdfs/extents_queue.c            |  1723 +++
 fs/ssdfs/extents_queue.h            |   105 +
 fs/ssdfs/extents_tree.c             | 13060 ++++++++++++++++++++
 fs/ssdfs/extents_tree.h             |   171 +
 fs/ssdfs/file.c                     |  2523 ++++
 fs/ssdfs/fs_error.c                 |   257 +
 fs/ssdfs/inode.c                    |  1190 ++
 fs/ssdfs/inodes_tree.c              |  5534 +++++++++
 fs/ssdfs/inodes_tree.h              |   177 +
 fs/ssdfs/invalidated_extents_tree.c |  7063 +++++++++++
 fs/ssdfs/invalidated_extents_tree.h |    95 +
 fs/ssdfs/log_footer.c               |   901 ++
 fs/ssdfs/offset_translation_table.c |  8160 +++++++++++++
 fs/ssdfs/offset_translation_table.h |   446 +
 fs/ssdfs/options.c                  |   190 +
 fs/ssdfs/page_array.c               |  1746 +++
 fs/ssdfs/page_array.h               |   119 +
 fs/ssdfs/page_vector.c              |   437 +
 fs/ssdfs/page_vector.h              |    64 +
 fs/ssdfs/peb.c                      |   813 ++
 fs/ssdfs/peb.h                      |   970 ++
 fs/ssdfs/peb_block_bitmap.c         |  3958 ++++++
 fs/ssdfs/peb_block_bitmap.h         |   165 +
 fs/ssdfs/peb_container.c            |  5649 +++++++++
 fs/ssdfs/peb_container.h            |   291 +
 fs/ssdfs/peb_flush_thread.c         | 16856 +++++++++++++++++++++++++
 fs/ssdfs/peb_gc_thread.c            |  2953 +++++
 fs/ssdfs/peb_mapping_queue.c        |   334 +
 fs/ssdfs/peb_mapping_queue.h        |    67 +
 fs/ssdfs/peb_mapping_table.c        | 12706 +++++++++++++++++++
 fs/ssdfs/peb_mapping_table.h        |   699 ++
 fs/ssdfs/peb_mapping_table_cache.c  |  4702 +++++++
 fs/ssdfs/peb_mapping_table_cache.h  |   119 +
 fs/ssdfs/peb_mapping_table_thread.c |  2817 +++++
 fs/ssdfs/peb_migration_scheme.c     |  1302 ++
 fs/ssdfs/peb_read_thread.c          | 10672 ++++++++++++++++
 fs/ssdfs/readwrite.c                |   651 +
 fs/ssdfs/recovery.c                 |  3144 +++++
 fs/ssdfs/recovery.h                 |   446 +
 fs/ssdfs/recovery_fast_search.c     |  1194 ++
 fs/ssdfs/recovery_slow_search.c     |   585 +
 fs/ssdfs/recovery_thread.c          |  1196 ++
 fs/ssdfs/request_queue.c            |  1240 ++
 fs/ssdfs/request_queue.h            |   417 +
 fs/ssdfs/segment.c                  |  5262 ++++++++
 fs/ssdfs/segment.h                  |   957 ++
 fs/ssdfs/segment_bitmap.c           |  4821 ++++++++
 fs/ssdfs/segment_bitmap.h           |   459 +
 fs/ssdfs/segment_bitmap_tables.c    |   814 ++
 fs/ssdfs/segment_block_bitmap.c     |  1425 +++
 fs/ssdfs/segment_block_bitmap.h     |   205 +
 fs/ssdfs/segment_tree.c             |   748 ++
 fs/ssdfs/segment_tree.h             |    66 +
 fs/ssdfs/sequence_array.c           |   639 +
 fs/ssdfs/sequence_array.h           |   119 +
 fs/ssdfs/ssdfs.h                    |   411 +
 fs/ssdfs/ssdfs_constants.h          |    81 +
 fs/ssdfs/ssdfs_fs_info.h            |   412 +
 fs/ssdfs/ssdfs_inline.h             |  1346 ++
 fs/ssdfs/ssdfs_inode_info.h         |   143 +
 fs/ssdfs/ssdfs_thread_info.h        |    42 +
 fs/ssdfs/super.c                    |  4044 ++++++
 fs/ssdfs/version.h                  |     7 +
 fs/ssdfs/volume_header.c            |  1256 ++
 include/linux/ssdfs_fs.h            |  3468 ++++++
 include/trace/events/ssdfs.h        |   255 +
 include/uapi/linux/magic.h          |     1 +
 include/uapi/linux/ssdfs_fs.h       |   117 +
 97 files changed, 205963 insertions(+)
 create mode 100644 fs/ssdfs/Kconfig
 create mode 100644 fs/ssdfs/Makefile
 create mode 100644 fs/ssdfs/block_bitmap.c
 create mode 100644 fs/ssdfs/block_bitmap.h
 create mode 100644 fs/ssdfs/block_bitmap_tables.c
 create mode 100644 fs/ssdfs/btree.c
 create mode 100644 fs/ssdfs/btree.h
 create mode 100644 fs/ssdfs/btree_hierarchy.c
 create mode 100644 fs/ssdfs/btree_hierarchy.h
 create mode 100644 fs/ssdfs/btree_node.c
 create mode 100644 fs/ssdfs/btree_node.h
 create mode 100644 fs/ssdfs/btree_search.c
 create mode 100644 fs/ssdfs/btree_search.h
 create mode 100644 fs/ssdfs/compr_lzo.c
 create mode 100644 fs/ssdfs/compr_zlib.c
 create mode 100644 fs/ssdfs/compression.c
 create mode 100644 fs/ssdfs/compression.h
 create mode 100644 fs/ssdfs/current_segment.c
 create mode 100644 fs/ssdfs/current_segment.h
 create mode 100644 fs/ssdfs/dentries_tree.c
 create mode 100644 fs/ssdfs/dentries_tree.h
 create mode 100644 fs/ssdfs/dev_bdev.c
 create mode 100644 fs/ssdfs/dev_mtd.c
 create mode 100644 fs/ssdfs/dev_zns.c
 create mode 100644 fs/ssdfs/dir.c
 create mode 100644 fs/ssdfs/dynamic_array.c
 create mode 100644 fs/ssdfs/dynamic_array.h
 create mode 100644 fs/ssdfs/extents_queue.c
 create mode 100644 fs/ssdfs/extents_queue.h
 create mode 100644 fs/ssdfs/extents_tree.c
 create mode 100644 fs/ssdfs/extents_tree.h
 create mode 100644 fs/ssdfs/file.c
 create mode 100644 fs/ssdfs/fs_error.c
 create mode 100644 fs/ssdfs/inode.c
 create mode 100644 fs/ssdfs/inodes_tree.c
 create mode 100644 fs/ssdfs/inodes_tree.h
 create mode 100644 fs/ssdfs/invalidated_extents_tree.c
 create mode 100644 fs/ssdfs/invalidated_extents_tree.h
 create mode 100644 fs/ssdfs/log_footer.c
 create mode 100644 fs/ssdfs/offset_translation_table.c
 create mode 100644 fs/ssdfs/offset_translation_table.h
 create mode 100644 fs/ssdfs/options.c
 create mode 100644 fs/ssdfs/page_array.c
 create mode 100644 fs/ssdfs/page_array.h
 create mode 100644 fs/ssdfs/page_vector.c
 create mode 100644 fs/ssdfs/page_vector.h
 create mode 100644 fs/ssdfs/peb.c
 create mode 100644 fs/ssdfs/peb.h
 create mode 100644 fs/ssdfs/peb_block_bitmap.c
 create mode 100644 fs/ssdfs/peb_block_bitmap.h
 create mode 100644 fs/ssdfs/peb_container.c
 create mode 100644 fs/ssdfs/peb_container.h
 create mode 100644 fs/ssdfs/peb_flush_thread.c
 create mode 100644 fs/ssdfs/peb_gc_thread.c
 create mode 100644 fs/ssdfs/peb_mapping_queue.c
 create mode 100644 fs/ssdfs/peb_mapping_queue.h
 create mode 100644 fs/ssdfs/peb_mapping_table.c
 create mode 100644 fs/ssdfs/peb_mapping_table.h
 create mode 100644 fs/ssdfs/peb_mapping_table_cache.c
 create mode 100644 fs/ssdfs/peb_mapping_table_cache.h
 create mode 100644 fs/ssdfs/peb_mapping_table_thread.c
 create mode 100644 fs/ssdfs/peb_migration_scheme.c
 create mode 100644 fs/ssdfs/peb_read_thread.c
 create mode 100644 fs/ssdfs/readwrite.c
 create mode 100644 fs/ssdfs/recovery.c
 create mode 100644 fs/ssdfs/recovery.h
 create mode 100644 fs/ssdfs/recovery_fast_search.c
 create mode 100644 fs/ssdfs/recovery_slow_search.c
 create mode 100644 fs/ssdfs/recovery_thread.c
 create mode 100644 fs/ssdfs/request_queue.c
 create mode 100644 fs/ssdfs/request_queue.h
 create mode 100644 fs/ssdfs/segment.c
 create mode 100644 fs/ssdfs/segment.h
 create mode 100644 fs/ssdfs/segment_bitmap.c
 create mode 100644 fs/ssdfs/segment_bitmap.h
 create mode 100644 fs/ssdfs/segment_bitmap_tables.c
 create mode 100644 fs/ssdfs/segment_block_bitmap.c
 create mode 100644 fs/ssdfs/segment_block_bitmap.h
 create mode 100644 fs/ssdfs/segment_tree.c
 create mode 100644 fs/ssdfs/segment_tree.h
 create mode 100644 fs/ssdfs/sequence_array.c
 create mode 100644 fs/ssdfs/sequence_array.h
 create mode 100644 fs/ssdfs/ssdfs.h
 create mode 100644 fs/ssdfs/ssdfs_constants.h
 create mode 100644 fs/ssdfs/ssdfs_fs_info.h
 create mode 100644 fs/ssdfs/ssdfs_inline.h
 create mode 100644 fs/ssdfs/ssdfs_inode_info.h
 create mode 100644 fs/ssdfs/ssdfs_thread_info.h
 create mode 100644 fs/ssdfs/super.c
 create mode 100644 fs/ssdfs/version.h
 create mode 100644 fs/ssdfs/volume_header.c
 create mode 100644 include/linux/ssdfs_fs.h
 create mode 100644 include/trace/events/ssdfs.h
 create mode 100644 include/uapi/linux/ssdfs_fs.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [RFC PATCH 01/76] ssdfs: introduce SSDFS on-disk layout
  2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
@ 2023-02-25  1:08 ` Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 02/76] ssdfs: key file system declarations Viacheslav Dubeyko
                   ` (75 subsequent siblings)
  76 siblings, 0 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

SSDFS architecture is based on segment concept. Segment is a portion of
file system volume that has to be aligned on erase block size. Segment
can include one or several erase blocks. It is basic unit to allocate
and to manage a free space of file system volume. Erase block is a basic
unit to keep metadata and user data. Every erase block contains a
sequence of logs. Log starts from segment header (struct ssdfs_segment_header)
or partial log header (struct ssdfs_partial_log_header). Full log can be
finished with log footer (struct ssdfs_log_footer).

Log's header (+footer) contains all necessary metadata describing
the log's payload. The log's metadata includes:
(1) block bitmap (struct ssdfs_block_bitmap_fragment) +
    (struct ssdfs_block_bitmap_header): tracking the state of logical
    blocks (free, pre-allocated, valid, invalid) in segment.
(2) offset translation table (struct ssdfs_blk2off_table_header) +
    (struct ssdfs_phys_offset_table_header) +
    (struct ssdfs_area_block_table): converts logical block into
    position inside of particular erase block.

Additionally, log's header is the copy of superblock that keeps
knowledge of location the all SSDFS metadata structures. SSDFS has:
(1) mapping table (struct ssdfs_leb_table_fragment_header) +
    (struct ssdfs_peb_table_fragment_header): implements the mapping of
    logical erase blocks into "physical" ones.
(2) mapping table cache (struct ssdfs_maptbl_cache_header): copy of content of
    mapping table for some type of erase blocks. The cache is used for
    conversion logical erase block ID into "physical" erase block ID in
    the case when the fragment of mapping table is not initialized yet.
(3) segment bitmap (struct ssdfs_segbmap_fragment_header): tracking state
    (clean, using, used, pre-dirty, dirty, reserved) of segments with
    the goal of searching, allocation, erase, and garbage collection.
(4) b-tree (struct ssdfs_btree_descriptor) + (struct ssdfs_btree_index_key) +
    (struct ssdfs_btree_node_header): all the rest metadata structures are
    represented by b-trees.
(5) inodes b-tree (struct ssdfs_inodes_btree) +
    (struct ssdfs_inodes_btree_node_header): keeps raw inodes of existing
    file system objects (struct ssdfs_inode).
(6) dentries b-tree (struct ssdfs_dentries_btree_descriptor) +
    (struct ssdfs_dentries_btree_node_header): keeps directory entries
    (struct ssdfs_dir_entry).
(7) extents b-tree (struct ssdfs_extents_btree_descriptor) +
    (struct ssdfs_extents_btree_node_header): keeps raw extents describing
    the location of piece of data (struct ssdfs_raw_fork) +
    (struct ssdfs_raw_extent).
(8) xattr b-tree (struct ssdfs_xattr_btree_descriptor) +
    (struct ssdfs_xattrs_btree_node_header): keeps extended attributes of
    file or folder (struct ssdfs_xattr_entry).
(9) invalidated extents b-tree (struct ssdfs_invalidated_extents_btree) +
    (struct ssdfs_invextree_node_header): keeps information about invalidated
    extents for ZNS SSD + SMR HDD use cases.
(10) shared dictionary b-tree (struct ssdfs_shared_dictionary_btree) +
     (struct ssdfs_shared_dictionary_node_header): keeps long names
     (more than 12 symbols) in the form of tries.
(11) snapshots b-tree (struct ssdfs_snapshots_btree) +
     (struct ssdfs_snapshots_btree_node_header): keeps snapshots info
     (struct ssdfs_snapshot) and association of erase block IDs with
     timestamps (struct ssdfs_peb2time_set) + (struct ssdfs_peb2time_pair).

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
CC: Viacheslav Dubeyko <viacheslav.dubeyko@bytedance.com>
CC: Luka Perkov <luka.perkov@sartura.hr>
CC: Bruno Banelli <bruno.banelli@sartura.hr>
---
 include/linux/ssdfs_fs.h | 3468 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 3468 insertions(+)
 create mode 100644 include/linux/ssdfs_fs.h

diff --git a/include/linux/ssdfs_fs.h b/include/linux/ssdfs_fs.h
new file mode 100644
index 000000000000..a41725234982
--- /dev/null
+++ b/include/linux/ssdfs_fs.h
@@ -0,0 +1,3468 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * include/linux/ssdfs_fs.h - SSDFS on-disk structures and common declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _LINUX_SSDFS_H
+#define _LINUX_SSDFS_H
+
+#include <uapi/linux/ssdfs_fs.h>
+
+typedef u8 __le8;
+
+struct ssdfs_inode;
+
+/*
+ * struct ssdfs_revision - metadata structure version
+ * @major: major version number
+ * @minor: minor version number
+ */
+struct ssdfs_revision {
+/* 0x0000 */
+	__le8 major;
+	__le8 minor;
+
+/* 0x0002 */
+}  __packed;
+
+/*
+ * struct ssdfs_signature - metadata structure magic signature
+ * @common: common magic value
+ * @key: detailed magic value
+ */
+struct ssdfs_signature {
+/* 0x0000 */
+	__le32 common;
+	__le16 key;
+	struct ssdfs_revision version;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_metadata_check - metadata structure checksum
+ * @bytes: bytes count of CRC calculation for the structure
+ * @flags: flags
+ * @csum: checksum
+ */
+struct ssdfs_metadata_check {
+/* 0x0000 */
+	__le16 bytes;
+#define SSDFS_CRC32			(1 << 0)
+#define SSDFS_ZLIB_COMPRESSED		(1 << 1)
+#define SSDFS_LZO_COMPRESSED		(1 << 2)
+	__le16 flags;
+	__le32 csum;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_raw_extent - raw (on-disk) extent
+ * @seg_id: segment number
+ * @logical_blk: logical block number
+ * @len: count of blocks in extent
+ */
+struct ssdfs_raw_extent {
+/* 0x0000 */
+	__le64 seg_id;
+	__le32 logical_blk;
+	__le32 len;
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_meta_area_extent - metadata area extent
+ * @start_id: starting identification number
+ * @len: count of items in metadata area
+ * @type: item's type
+ * @flags: flags
+ */
+struct ssdfs_meta_area_extent {
+/* 0x0000 */
+	__le64 start_id;
+	__le32 len;
+	__le16 type;
+	__le16 flags;
+
+/* 0x0010 */
+} __packed;
+
+/* Type of item in metadata area */
+enum {
+	SSDFS_EMPTY_EXTENT_TYPE,
+	SSDFS_SEG_EXTENT_TYPE,
+	SSDFS_PEB_EXTENT_TYPE,
+	SSDFS_BLK_EXTENT_TYPE,
+};
+
+/* Type of segbmap's segments */
+enum {
+	SSDFS_MAIN_SEGBMAP_SEG,
+	SSDFS_COPY_SEGBMAP_SEG,
+	SSDFS_SEGBMAP_SEG_COPY_MAX,
+};
+
+#define SSDFS_SEGBMAP_SEGS	8
+
+/*
+ * struct ssdfs_segbmap_sb_header - superblock's segment bitmap header
+ * @fragments_count: fragments count in segment bitmap
+ * @fragments_per_seg: segbmap's fragments per segment
+ * @fragments_per_peb: segbmap's fragments per PEB
+ * @fragment_size: size of fragment in bytes
+ * @bytes_count: size of segment bitmap in bytes (payload part)
+ * @flags: segment bitmap's flags
+ * @segs_count: count of really reserved segments in one chain
+ * @segs: array of segbmap's segment numbers
+ */
+struct ssdfs_segbmap_sb_header {
+/* 0x0000 */
+	__le16 fragments_count;
+	__le16 fragments_per_seg;
+	__le16 fragments_per_peb;
+	__le16 fragment_size;
+
+/* 0x0008 */
+	__le32 bytes_count;
+	__le16 flags;
+	__le16 segs_count;
+
+/* 0x0010 */
+	__le64 segs[SSDFS_SEGBMAP_SEGS][SSDFS_SEGBMAP_SEG_COPY_MAX];
+
+/* 0x0090 */
+} __packed;
+
+/* Segment bitmap's flags */
+#define SSDFS_SEGBMAP_HAS_COPY		(1 << 0)
+#define SSDFS_SEGBMAP_ERROR		(1 << 1)
+#define SSDFS_SEGBMAP_MAKE_ZLIB_COMPR	(1 << 2)
+#define SSDFS_SEGBMAP_MAKE_LZO_COMPR	(1 << 3)
+#define SSDFS_SEGBMAP_FLAGS_MASK	(0xF)
+
+enum {
+	SSDFS_MAIN_MAPTBL_SEG,
+	SSDFS_COPY_MAPTBL_SEG,
+	SSDFS_MAPTBL_SEG_COPY_MAX,
+};
+
+#define SSDFS_MAPTBL_RESERVED_EXTENTS	(3)
+
+/*
+ * struct ssdfs_maptbl_sb_header - superblock's mapping table header
+ * @fragments_count: count of fragments in mapping table
+ * @fragment_bytes: bytes in one mapping table's fragment
+ * @last_peb_recover_cno: checkpoint of last trying to recover PEBs
+ * @lebs_count: count of Logical Erase Blocks (LEBs) are described by table
+ * @pebs_count: count of Physical Erase Blocks (PEBs) are described by table
+ * @fragments_per_seg: count of mapping table's fragments in segment
+ * @fragments_per_peb: count of mapping table's fragments in PEB
+ * @flags: mapping table's flags
+ * @pre_erase_pebs: count of PEBs in pre-erase state
+ * @lebs_per_fragment: count of LEBs are described by fragment
+ * @pebs_per_fragment: count of PEBs are described by fragment
+ * @pebs_per_stripe: count of PEBs are described by stripe
+ * @stripes_per_fragment: count of stripes in fragment
+ * @extents: metadata extents that describe mapping table location
+ */
+struct ssdfs_maptbl_sb_header {
+/* 0x0000 */
+	__le32 fragments_count;
+	__le32 fragment_bytes;
+	__le64 last_peb_recover_cno;
+
+/* 0x0010 */
+	__le64 lebs_count;
+	__le64 pebs_count;
+
+/* 0x0020 */
+	__le16 fragments_per_seg;
+	__le16 fragments_per_peb;
+	__le16 flags;
+	__le16 pre_erase_pebs;
+
+/* 0x0028 */
+	__le16 lebs_per_fragment;
+	__le16 pebs_per_fragment;
+	__le16 pebs_per_stripe;
+	__le16 stripes_per_fragment;
+
+/* 0x0030 */
+#define MAPTBL_LIMIT1	(SSDFS_MAPTBL_RESERVED_EXTENTS)
+#define MAPTBL_LIMIT2	(SSDFS_MAPTBL_SEG_COPY_MAX)
+	struct ssdfs_meta_area_extent extents[MAPTBL_LIMIT1][MAPTBL_LIMIT2];
+
+/* 0x0090 */
+} __packed;
+
+/* Mapping table's flags */
+#define SSDFS_MAPTBL_HAS_COPY		(1 << 0)
+#define SSDFS_MAPTBL_ERROR		(1 << 1)
+#define SSDFS_MAPTBL_MAKE_ZLIB_COMPR	(1 << 2)
+#define SSDFS_MAPTBL_MAKE_LZO_COMPR	(1 << 3)
+#define SSDFS_MAPTBL_UNDER_FLUSH	(1 << 4)
+#define SSDFS_MAPTBL_FLAGS_MASK		(0x1F)
+
+/*
+ * struct ssdfs_btree_descriptor - generic btree descriptor
+ * @magic: magic signature
+ * @flags: btree flags
+ * @type: btree type
+ * @log_node_size: log2(node size in bytes)
+ * @pages_per_node: physical pages per btree node
+ * @node_ptr_size: size in bytes of pointer on btree node
+ * @index_size: size in bytes of btree's index
+ * @item_size: size in bytes of btree's item
+ * @index_area_min_size: minimal size in bytes of index area in btree node
+ *
+ * The goal of a btree descriptor is to keep
+ * the main features of a tree.
+ */
+struct ssdfs_btree_descriptor {
+/* 0x0000 */
+	__le32 magic;
+#define SSDFS_BTREE_DESC_INDEX_AREA_RESIZABLE		(1 << 0)
+#define SSDFS_BTREE_DESC_FLAGS_MASK			0x1
+	__le16 flags;
+	__le8 type;
+	__le8 log_node_size;
+
+/* 0x0008 */
+	__le8 pages_per_node;
+	__le8 node_ptr_size;
+	__le16 index_size;
+	__le16 item_size;
+	__le16 index_area_min_size;
+
+/* 0x0010 */
+} __packed;
+
+/* Btree types */
+enum {
+	SSDFS_BTREE_UNKNOWN_TYPE,
+	SSDFS_INODES_BTREE,
+	SSDFS_DENTRIES_BTREE,
+	SSDFS_EXTENTS_BTREE,
+	SSDFS_SHARED_EXTENTS_BTREE,
+	SSDFS_XATTR_BTREE,
+	SSDFS_SHARED_XATTR_BTREE,
+	SSDFS_SHARED_DICTIONARY_BTREE,
+	SSDFS_SNAPSHOTS_BTREE,
+	SSDFS_INVALIDATED_EXTENTS_BTREE,
+	SSDFS_BTREE_TYPE_MAX
+};
+
+/*
+ * struct ssdfs_dentries_btree_descriptor - dentries btree descriptor
+ * @desc: btree descriptor
+ */
+struct ssdfs_dentries_btree_descriptor {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x10];
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_extents_btree_descriptor - extents btree descriptor
+ * @desc: btree descriptor
+ */
+struct ssdfs_extents_btree_descriptor {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x10];
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_xattr_btree_descriptor - extended attr btree descriptor
+ * @desc: btree descriptor
+ */
+struct ssdfs_xattr_btree_descriptor {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x10];
+
+/* 0x0020 */
+} __packed;
+
+/* Type of superblock segments */
+enum {
+	SSDFS_MAIN_SB_SEG,
+	SSDFS_COPY_SB_SEG,
+	SSDFS_SB_SEG_COPY_MAX,
+};
+
+/* Different phases of superblok segment */
+enum {
+	SSDFS_CUR_SB_SEG,
+	SSDFS_NEXT_SB_SEG,
+	SSDFS_RESERVED_SB_SEG,
+	SSDFS_PREV_SB_SEG,
+	SSDFS_SB_CHAIN_MAX,
+};
+
+/*
+ * struct ssdfs_leb2peb_pair - LEB/PEB numbers association
+ * @leb_id: LEB ID number
+ * @peb_id: PEB ID number
+ */
+struct ssdfs_leb2peb_pair {
+/* 0x0000 */
+	__le64 leb_id;
+	__le64 peb_id;
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_btree_index - btree index
+ * @hash: hash value
+ * @extent: btree node's extent
+ *
+ * The goal of btree index is to provide the way to search
+ * a proper btree node by means of hash value. The hash
+ * value could be inode_id, string hash and so on.
+ */
+struct ssdfs_btree_index {
+/* 0x0000 */
+	__le64 hash;
+
+/* 0x0008 */
+	struct ssdfs_raw_extent extent;
+
+/* 0x0018 */
+} __packed;
+
+#define SSDFS_BTREE_NODE_INVALID_ID	(U32_MAX)
+
+/*
+ * struct ssdfs_btree_index_key - node identification key
+ * @node_id: node identification key
+ * @node_type: type of the node
+ * @height: node's height
+ * @flags: index flags
+ * @index: node's index
+ */
+struct ssdfs_btree_index_key {
+/* 0x0000 */
+	__le32 node_id;
+	__le8 node_type;
+	__le8 height;
+#define SSDFS_BTREE_INDEX_HAS_VALID_EXTENT		(1 << 0)
+#define SSDFS_BTREE_INDEX_SHOW_EMPTY_NODE		(1 << 1)
+#define SSDFS_BTREE_INDEX_SHOW_FREE_ITEMS		(1 << 2)
+#define SSDFS_BTREE_INDEX_HAS_CHILD_WITH_FREE_ITEMS	(1 << 3)
+#define SSDFS_BTREE_INDEX_SHOW_PREALLOCATED_CHILD	(1 << 4)
+#define SSDFS_BTREE_INDEX_FLAGS_MASK			0x1F
+	__le16 flags;
+
+/* 0x0008 */
+	struct ssdfs_btree_index index;
+
+/* 0x0020 */
+} __packed;
+
+#define SSDFS_BTREE_ROOT_NODE_INDEX_COUNT	(2)
+
+/*
+ * struct ssdfs_btree_root_node_header - root node header
+ * @height: btree height
+ * @items_count: count of items in the root node
+ * @flags: root node flags
+ * @type: root node type
+ * @upper_node_id: last allocated the node identification number
+ * @node_ids: root node's children IDs
+ */
+struct ssdfs_btree_root_node_header {
+/* 0x0000 */
+#define SSDFS_BTREE_LEAF_NODE_HEIGHT	(0)
+	__le8 height;
+	__le8 items_count;
+	__le8 flags;
+	__le8 type;
+
+/* 0x0004 */
+#define SSDFS_BTREE_ROOT_NODE_ID		(0)
+	__le32 upper_node_id;
+
+/* 0x0008 */
+	__le32 node_ids[SSDFS_BTREE_ROOT_NODE_INDEX_COUNT];
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_btree_inline_root_node - btree root node
+ * @header: node header
+ * @indexes: root node's index array
+ *
+ * The goal of root node is to live inside of 0x40 bytes
+ * space and to keep the root index node of the tree.
+ * The inline root node could be the part of inode
+ * structure or the part of btree root. The inode has
+ * 0x80 bytes space. But inode needs to store as
+ * extent/dentries tree as extended attributes tree.
+ * So, 0x80 bytes is used for storing two btrees.
+ *
+ * The root node's indexes has pre-defined type.
+ * If height of the tree equals to 1 - 3 range then
+ * root node's indexes define hybrid nodes. Otherwise,
+ * if tree's height is greater than 3 then root node's
+ * indexes define pure index nodes.
+ */
+struct ssdfs_btree_inline_root_node {
+/* 0x0000 */
+	struct ssdfs_btree_root_node_header header;
+
+/* 0x0010 */
+#define SSDFS_ROOT_NODE_LEFT_LEAF_NODE		(0)
+#define SSDFS_ROOT_NODE_RIGHT_LEAF_NODE		(1)
+#define SSDFS_BTREE_ROOT_NODE_INDEX_COUNT	(2)
+	struct ssdfs_btree_index indexes[SSDFS_BTREE_ROOT_NODE_INDEX_COUNT];
+
+/* 0x0040 */
+} __packed;
+
+/*
+ * struct ssdfs_inodes_btree - inodes btree
+ * @desc: btree descriptor
+ * @allocated_inodes: count of allocated inodes
+ * @free_inodes: count of free inodes
+ * @inodes_capacity: count of inodes in the whole btree
+ * @leaf_nodes: count of leaf btree nodes
+ * @nodes_count: count of nodes in the whole btree
+ * @upper_allocated_ino: maximal allocated inode ID number
+ * @root_node: btree's root node
+ *
+ * The goal of a btree root is to keep
+ * the main features of a tree and knowledge
+ * about two root indexes. These indexes splits
+ * the whole btree on two branches.
+ */
+struct ssdfs_inodes_btree {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le64 allocated_inodes;
+	__le64 free_inodes;
+
+/* 0x0020 */
+	__le64 inodes_capacity;
+	__le32 leaf_nodes;
+	__le32 nodes_count;
+
+/* 0x0030 */
+	__le64 upper_allocated_ino;
+	__le8 reserved[0x8];
+
+/* 0x0040 */
+	struct ssdfs_btree_inline_root_node root_node;
+
+/* 0x0080 */
+} __packed;
+
+/*
+ * struct ssdfs_shared_extents_btree - shared extents btree
+ * @desc: btree descriptor
+ * @root_node: btree's root node
+ *
+ * The goal of a btree root is to keep
+ * the main features of a tree and knowledge
+ * about two root indexes. These indexes splits
+ * the whole btree on two branches.
+ */
+struct ssdfs_shared_extents_btree {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x30];
+
+/* 0x0040 */
+	struct ssdfs_btree_inline_root_node root_node;
+
+/* 0x0080 */
+} __packed;
+
+/*
+ * ssdfs_shared_dictionary_btree - shared strings dictionary btree
+ * @desc: btree descriptor
+ * @root_node: btree's root node
+ *
+ * The goal of a btree root is to keep
+ * the main features of a tree and knowledge
+ * about two root indexes. These indexes splits
+ * the whole btree on two branches.
+ */
+struct ssdfs_shared_dictionary_btree {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x30];
+
+/* 0x0040 */
+	struct ssdfs_btree_inline_root_node root_node;
+
+/* 0x0080 */
+} __packed;
+
+/*
+ * struct ssdfs_shared_xattr_btree - shared extended attributes btree
+ * @desc: btree descriptor
+ * @root_node: btree's root node
+ *
+ * The goal of a btree root is to keep
+ * the main features of a tree and knowledge
+ * about two root indexes. These indexes splits
+ * the whole btree on two branches.
+ */
+struct ssdfs_shared_xattr_btree {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x30];
+
+/* 0x0040 */
+	struct ssdfs_btree_inline_root_node root_node;
+
+/* 0x0080 */
+} __packed;
+
+/*
+ * struct ssdfs_snapshots_btree - snapshots btree
+ * @desc: btree descriptor
+ * @root_node: btree's root node
+ *
+ * The goal of a btree root is to keep
+ * the main features of a tree and knowledge
+ * about two root indexes. These indexes splits
+ * the whole btree on two branches.
+ */
+struct ssdfs_snapshots_btree {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x30];
+
+/* 0x0040 */
+	struct ssdfs_btree_inline_root_node root_node;
+
+/* 0x0080 */
+} __packed;
+
+/*
+ * struct ssdfs_invalidated_extents_btree - invalidated extents btree
+ * @desc: btree descriptor
+ * @root_node: btree's root node
+ *
+ * The goal of a btree root is to keep
+ * the main features of a tree and knowledge
+ * about two root indexes. These indexes splits
+ * the whole btree on two branches.
+ */
+struct ssdfs_invalidated_extents_btree {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x30];
+
+/* 0x0040 */
+	struct ssdfs_btree_inline_root_node root_node;
+
+/* 0x0080 */
+} __packed;
+
+enum {
+	SSDFS_CUR_DATA_SEG,
+	SSDFS_CUR_LNODE_SEG,
+	SSDFS_CUR_HNODE_SEG,
+	SSDFS_CUR_IDXNODE_SEG,
+	SSDFS_CUR_DATA_UPDATE_SEG,	/* ZNS SSD case */
+	SSDFS_CUR_SEGS_COUNT,
+};
+
+/*
+ * struct ssdfs_blk_bmap_options - block bitmap options
+ * @flags: block bitmap's flags
+ * @compression: compression type
+ */
+struct ssdfs_blk_bmap_options {
+/* 0x0000 */
+#define SSDFS_BLK_BMAP_CREATE_COPY		(1 << 0)
+#define SSDFS_BLK_BMAP_MAKE_COMPRESSION		(1 << 1)
+#define SSDFS_BLK_BMAP_OPTIONS_MASK		(0x3)
+	__le16 flags;
+#define SSDFS_BLK_BMAP_NOCOMPR_TYPE		(0)
+#define SSDFS_BLK_BMAP_ZLIB_COMPR_TYPE		(1)
+#define SSDFS_BLK_BMAP_LZO_COMPR_TYPE		(2)
+	__le8 compression;
+	__le8 reserved;
+
+/* 0x0004 */
+} __packed;
+
+/*
+ * struct ssdfs_blk2off_tbl_options - offset translation table options
+ * @flags: offset translation table's flags
+ * @compression: compression type
+ */
+struct ssdfs_blk2off_tbl_options {
+/* 0x0000 */
+#define SSDFS_BLK2OFF_TBL_CREATE_COPY		(1 << 0)
+#define SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION	(1 << 1)
+#define SSDFS_BLK2OFF_TBL_OPTIONS_MASK		(0x3)
+	__le16 flags;
+#define SSDFS_BLK2OFF_TBL_NOCOMPR_TYPE		(0)
+#define SSDFS_BLK2OFF_TBL_ZLIB_COMPR_TYPE	(1)
+#define SSDFS_BLK2OFF_TBL_LZO_COMPR_TYPE	(2)
+	__le8 compression;
+	__le8 reserved;
+
+/* 0x0004 */
+} __packed;
+
+/*
+ * struct ssdfs_user_data_options - user data options
+ * @flags: user data's flags
+ * @compression: compression type
+ * @migration_threshold: default value of destination PEBs in migration
+ */
+struct ssdfs_user_data_options {
+/* 0x0000 */
+#define SSDFS_USER_DATA_MAKE_COMPRESSION	(1 << 0)
+#define SSDFS_USER_DATA_OPTIONS_MASK		(0x1)
+	__le16 flags;
+#define SSDFS_USER_DATA_NOCOMPR_TYPE		(0)
+#define SSDFS_USER_DATA_ZLIB_COMPR_TYPE		(1)
+#define SSDFS_USER_DATA_LZO_COMPR_TYPE		(2)
+	__le8 compression;
+	__le8 reserved1;
+	__le16 migration_threshold;
+	__le16 reserved2;
+
+/* 0x0008 */
+} __packed;
+
+#define SSDFS_INODE_HASNT_INLINE_FORKS		(0)
+#define SSDFS_INLINE_FORKS_COUNT		(2)
+#define SSDFS_INLINE_EXTENTS_COUNT		(3)
+
+/*
+ * struct ssdfs_raw_fork - contiguous sequence of raw (on-disk) extents
+ * @start_offset: start logical offset in pages (blocks) from file's beginning
+ * @blks_count: count of logical blocks in the fork (no holes)
+ * @extents: sequence of raw (on-disk) extents
+ */
+struct ssdfs_raw_fork {
+/* 0x0000 */
+	__le64 start_offset;
+	__le64 blks_count;
+
+/* 0x0010 */
+	struct ssdfs_raw_extent extents[SSDFS_INLINE_EXTENTS_COUNT];
+
+/* 0x0040 */
+} __packed;
+
+/*
+ * struct ssdfs_name_hash - hash of the name
+ * @raw: raw value of the hash64
+ *
+ * The name's hash is 64 bits wide (8 bytes). But the hash64 has
+ * special structure. The first 4 bytes are the low hash (hash32_lo)
+ * of the name. The second 4 bytes is the high hash (hash32_hi)
+ * of the name. If the name lesser or equal to 12 symbols (inline
+ * name's string) then hash32_hi will be equal to zero always.
+ * If the name is greater than 12 symbols then the hash32_hi
+ * will be the hash of the rest of the name (excluding the
+ * first 12 symbols). The hash32_lo will be defined by inline
+ * name's length. The inline names (12 symbols long) will be
+ * stored into dentries only. The regular names will be stored
+ * partially in the dentry (12 symbols) and the whole name string
+ * will be stored into shared dictionary.
+ */
+struct ssdfs_name_hash {
+/* 0x0000 */
+	__le64 raw;
+
+/* 0x0008 */
+} __packed;
+
+/* Name hash related macros */
+#define SSDFS_NAME_HASH(hash32_lo, hash32_hi)({ \
+	u64 hash64 = (u32)hash32_lo; \
+	hash64 <<= 32; \
+	hash64 |= hash32_hi; \
+	hash64; \
+})
+#define SSDFS_NAME_HASH_LE64(hash32_lo, hash32_hi) \
+	(cpu_to_le64(SSDFS_NAME_HASH(hash32_lo, hash32_hi)))
+#define LE64_TO_SSDFS_HASH32_LO(hash_le64) \
+	((u32)(le64_to_cpu(hash_le64) >> 32))
+#define SSDFS_HASH32_LO(hash64) \
+	((u32)(hash64 >> 32))
+#define LE64_TO_SSDFS_HASH32_HI(hash_le64) \
+	((u32)(le64_to_cpu(hash_le64) & 0xFFFFFFFF))
+#define SSDFS_HASH32_HI(hash64) \
+	((u32)(hash64 & 0xFFFFFFFF))
+
+/*
+ * struct ssdfs_dir_entry - directory entry
+ * @ino: inode number
+ * @hash_code: name string's hash code
+ * @name_len: name length in bytes
+ * @dentry_type: dentry type
+ * @file_type: directory file types
+ * @flags: dentry's flags
+ * @inline_string: inline copy of the name or exclusive storage of short name
+ */
+struct ssdfs_dir_entry {
+/* 0x0000 */
+	__le64 ino;
+	__le64 hash_code;
+
+/* 0x0010 */
+	__le8 name_len;
+	__le8 dentry_type;
+	__le8 file_type;
+	__le8 flags;
+#define SSDFS_DENTRY_INLINE_NAME_MAX_LEN	(12)
+	__le8 inline_string[SSDFS_DENTRY_INLINE_NAME_MAX_LEN];
+
+/* 0x0020 */
+} __packed;
+
+/* Dentry types */
+enum {
+	SSDFS_DENTRY_UNKNOWN_TYPE,
+	SSDFS_INLINE_DENTRY,
+	SSDFS_REGULAR_DENTRY,
+	SSDFS_DENTRY_TYPE_MAX
+};
+
+/*
+ * SSDFS directory file types.
+ */
+enum {
+	SSDFS_FT_UNKNOWN,
+	SSDFS_FT_REG_FILE,
+	SSDFS_FT_DIR,
+	SSDFS_FT_CHRDEV,
+	SSDFS_FT_BLKDEV,
+	SSDFS_FT_FIFO,
+	SSDFS_FT_SOCK,
+	SSDFS_FT_SYMLINK,
+	SSDFS_FT_MAX
+};
+
+/* Dentry flags */
+#define SSDFS_DENTRY_HAS_EXTERNAL_STRING	(1 << 0)
+#define SSDFS_DENTRY_FLAGS_MASK			0x1
+
+/*
+ * struct ssdfs_blob_extent - blob's extent descriptor
+ * @hash: blob's hash
+ * @extent: blob's extent
+ */
+struct ssdfs_blob_extent {
+/* 0x0000 */
+	__le64 hash;
+	__le64 reserved;
+	struct ssdfs_raw_extent extent;
+
+/* 0x0020 */
+} __packed;
+
+#define SSDFS_XATTR_INLINE_BLOB_MAX_LEN		(32)
+#define SSDFS_XATTR_EXTERNAL_BLOB_MAX_LEN	(32768)
+
+/*
+ * struct ssdfs_blob_bytes - inline blob's byte stream
+ * @bytes: byte stream
+ */
+struct ssdfs_blob_bytes {
+/* 0x0000 */
+	__le8 bytes[SSDFS_XATTR_INLINE_BLOB_MAX_LEN];
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_xattr_entry - extended attribute entry
+ * @name_hash: hash of the name
+ * @inline_index: index of the inline xattr
+ * @name_len: length of the name
+ * @name_type: type of the name
+ * @name_flags: flags of the name
+ * @blob_len: blob length in bytes
+ * @blob_type: type of the blob
+ * @blob_flags: flags of the blob
+ * @inline_string: inline string of the name
+ * @blob.descriptor.hash: hash of the blob
+ * @blob.descriptor.extent: extent of the blob
+ * @blob.inline_value: inline value of the blob
+ *
+ * The extended attribute can be described by fixed size
+ * descriptor. The name of extended attribute can be inline
+ * or to be stored into the shared dictionary. If the name
+ * is greater than 16 symbols then it will be stored into shared
+ * dictionary. The blob part can be stored inline or,
+ * otherwise, the descriptor contains the hash of the blob
+ * and blob will be stored as ordinary file inside
+ * of logical blocks.
+ */
+struct ssdfs_xattr_entry {
+/* 0x0000 */
+	__le64 name_hash;
+
+/* 0x0008 */
+	__le8 inline_index;
+	__le8 name_len;
+	__le8 name_type;
+	__le8 name_flags;
+
+/* 0x000C */
+	__le16 blob_len;
+	__le8 blob_type;
+	__le8 blob_flags;
+
+/* 0x0010 */
+#define SSDFS_XATTR_INLINE_NAME_MAX_LEN	(16)
+	__le8 inline_string[SSDFS_XATTR_INLINE_NAME_MAX_LEN];
+
+/* 0x0020 */
+	union {
+		struct ssdfs_blob_extent descriptor;
+		struct ssdfs_blob_bytes inline_value;
+	} blob;
+
+/* 0x0040 */
+} __packed;
+
+/* registered names' prefixes */
+enum {
+	SSDFS_USER_NS_INDEX,
+	SSDFS_TRUSTED_NS_INDEX,
+	SSDFS_SYSTEM_NS_INDEX,
+	SSDFS_SECURITY_NS_INDEX,
+	SSDFS_REGISTERED_NS_NUMBER
+};
+
+static const char * const SSDFS_NS_PREFIX[] = {
+	"user.",
+	"trusted.",
+	"system.",
+	"security.",
+};
+
+/* xattr name types */
+enum {
+	SSDFS_XATTR_NAME_UNKNOWN_TYPE,
+	SSDFS_XATTR_INLINE_NAME,
+	SSDFS_XATTR_USER_INLINE_NAME,
+	SSDFS_XATTR_TRUSTED_INLINE_NAME,
+	SSDFS_XATTR_SYSTEM_INLINE_NAME,
+	SSDFS_XATTR_SECURITY_INLINE_NAME,
+	SSDFS_XATTR_REGULAR_NAME,
+	SSDFS_XATTR_USER_REGULAR_NAME,
+	SSDFS_XATTR_TRUSTED_REGULAR_NAME,
+	SSDFS_XATTR_SYSTEM_REGULAR_NAME,
+	SSDFS_XATTR_SECURITY_REGULAR_NAME,
+	SSDFS_XATTR_NAME_TYPE_MAX
+};
+
+/* xattr name flags */
+#define SSDFS_XATTR_HAS_EXTERNAL_STRING		(1 << 0)
+#define SSDFS_XATTR_NAME_FLAGS_MASK		0x1
+
+/* xattr blob types */
+enum {
+	SSDFS_XATTR_BLOB_UNKNOWN_TYPE,
+	SSDFS_XATTR_INLINE_BLOB,
+	SSDFS_XATTR_REGULAR_BLOB,
+	SSDFS_XATTR_BLOB_TYPE_MAX
+};
+
+/* xattr blob flags */
+#define SSDFS_XATTR_HAS_EXTERNAL_BLOB		(1 << 0)
+#define SSDFS_XATTR_BLOB_FLAGS_MASK		0x1
+
+#define SSDFS_INLINE_DENTRIES_PER_AREA		(2)
+#define SSDFS_INLINE_STREAM_SIZE_PER_AREA	(64)
+#define SSDFS_DEFAULT_INLINE_XATTR_COUNT	(1)
+
+/*
+ * struct ssdfs_inode_inline_stream - inode's inline stream
+ * @bytes: bytes array
+ */
+struct ssdfs_inode_inline_stream {
+/* 0x0000 */
+	__le8 bytes[SSDFS_INLINE_STREAM_SIZE_PER_AREA];
+
+/* 0x0040 */
+} __packed;
+
+/*
+ * struct ssdfs_inode_inline_dentries - inline dentries array
+ * @array: dentries array
+ */
+struct ssdfs_inode_inline_dentries {
+/* 0x0000 */
+	struct ssdfs_dir_entry array[SSDFS_INLINE_DENTRIES_PER_AREA];
+
+/* 0x0040 */
+} __packed;
+
+/*
+ * struct ssdfs_inode_private_area - inode's private area
+ * @area1.inline_stream: inline file's content
+ * @area1.extents_root: extents btree root node
+ * @area1.fork: inline fork
+ * @area1.dentries_root: dentries btree root node
+ * @area1.dentries: inline dentries
+ * @area2.inline_stream: inline file's content
+ * @area2.inline_xattr: inline extended attribute
+ * @area2.xattr_root: extended attributes btree root node
+ * @area2.fork: inline fork
+ * @area2.dentries: inline dentries
+ */
+struct ssdfs_inode_private_area {
+/* 0x0000 */
+	union {
+		struct ssdfs_inode_inline_stream inline_stream;
+		struct ssdfs_btree_inline_root_node extents_root;
+		struct ssdfs_raw_fork fork;
+		struct ssdfs_btree_inline_root_node dentries_root;
+		struct ssdfs_inode_inline_dentries dentries;
+	} area1;
+
+/* 0x0040 */
+	union {
+		struct ssdfs_inode_inline_stream inline_stream;
+		struct ssdfs_xattr_entry inline_xattr;
+		struct ssdfs_btree_inline_root_node xattr_root;
+		struct ssdfs_raw_fork fork;
+		struct ssdfs_inode_inline_dentries dentries;
+	} area2;
+
+/* 0x0080 */
+} __packed;
+
+/*
+ * struct ssdfs_inode - raw (on-disk) inode
+ * @magic: inode magic
+ * @mode: file mode
+ * @flags: file attributes
+ * @uid: owner user ID
+ * @gid: owner group ID
+ * @atime: access time (seconds)
+ * @ctime: change time (seconds)
+ * @mtime: modification time (seconds)
+ * @birthtime: inode creation time (seconds)
+ * @atime_nsec: access time in nano scale
+ * @ctime_nsec: change time in nano scale
+ * @mtime_nsec: modification time in nano scale
+ * @birthtime_nsec: creation time in nano scale
+ * @generation: file version (for NFS)
+ * @size: file size in bytes
+ * @blocks: file size in blocks
+ * @parent_ino: parent inode number
+ * @refcount: links count
+ * @checksum: inode checksum
+ * @ino: inode number
+ * @hash_code: hash code of file name
+ * @name_len: lengh of file name
+ * @forks_count: count of forks
+ * @internal: array of inline private areas of inode
+ */
+struct ssdfs_inode {
+/* 0x0000 */
+	__le16 magic;			/* Inode magic */
+	__le16 mode;			/* File mode */
+	__le32 flags;			/* file attributes */
+
+/* 0x0008 */
+	__le32 uid;			/* user ID */
+	__le32 gid;			/* group ID */
+
+/* 0x0010 */
+	__le64 atime;			/* access time */
+	__le64 ctime;			/* change time */
+	__le64 mtime;			/* modification time */
+	__le64 birthtime;		/* inode creation time */
+
+/* 0x0030 */
+	__le32 atime_nsec;		/* access time in nano scale */
+	__le32 ctime_nsec;		/* change time in nano scale */
+	__le32 mtime_nsec;		/* modification time in nano scale */
+	__le32 birthtime_nsec;		/* creation time in nano scale */
+
+/* 0x0040 */
+	__le64 generation;		/* file version (for NFS) */
+	__le64 size;			/* file size in bytes */
+	__le64 blocks;			/* file size in blocks */
+	__le64 parent_ino;		/* parent inode number */
+
+/* 0x0060 */
+	__le32 refcount;		/* links count */
+	__le32 checksum;		/* inode checksum */
+
+/* 0x0068 */
+/* TODO: maybe use the hash code of file name as inode number */
+	__le64 ino;			/* Inode number */
+	__le64 hash_code;		/* hash code of file name */
+	__le16 name_len;		/* lengh of file name */
+#define SSDFS_INODE_HAS_INLINE_EXTENTS		(1 << 0)
+#define SSDFS_INODE_HAS_EXTENTS_BTREE		(1 << 1)
+#define SSDFS_INODE_HAS_INLINE_DENTRIES		(1 << 2)
+#define SSDFS_INODE_HAS_DENTRIES_BTREE		(1 << 3)
+#define SSDFS_INODE_HAS_INLINE_XATTR		(1 << 4)
+#define SSDFS_INODE_HAS_XATTR_BTREE		(1 << 5)
+#define SSDFS_INODE_HAS_INLINE_FILE		(1 << 6)
+#define SSDFS_INODE_PRIVATE_FLAGS_MASK		0x7F
+	__le16 private_flags;
+
+	union {
+		__le32 forks;
+		__le32 dentries;
+	} count_of __packed;
+
+/* 0x0080 */
+	struct ssdfs_inode_private_area internal[1];
+
+/* 0x0100 */
+} __packed;
+
+#define SSDFS_IFREG_PRIVATE_FLAG_MASK \
+	(SSDFS_INODE_HAS_INLINE_EXTENTS | \
+	 SSDFS_INODE_HAS_EXTENTS_BTREE | \
+	 SSDFS_INODE_HAS_INLINE_XATTR | \
+	 SSDFS_INODE_HAS_XATTR_BTREE | \
+	 SSDFS_INODE_HAS_INLINE_FILE)
+
+#define SSDFS_IFDIR_PRIVATE_FLAG_MASK \
+	(SSDFS_INODE_HAS_INLINE_DENTRIES | \
+	 SSDFS_INODE_HAS_DENTRIES_BTREE | \
+	 SSDFS_INODE_HAS_INLINE_XATTR | \
+	 SSDFS_INODE_HAS_XATTR_BTREE)
+
+/*
+ * struct ssdfs_volume_header - static part of superblock
+ * @magic: magic signature + revision
+ * @check: metadata checksum
+ * @log_pagesize: log2(page size)
+ * @log_erasesize: log2(erase block size)
+ * @log_segsize: log2(segment size)
+ * @log_pebs_per_seg: log2(erase blocks per segment)
+ * @megabytes_per_peb: MBs in one PEB
+ * @pebs_per_seg: number of PEBs per segment
+ * @create_time: volume create timestamp (mkfs phase)
+ * @create_cno: volume create checkpoint
+ * @flags: volume creation flags
+ * @lebs_per_peb_index: difference of LEB IDs between PEB indexes in segment
+ * @sb_pebs: array of prev, cur and next superblock's PEB numbers
+ * @segbmap: superblock's segment bitmap header
+ * @maptbl: superblock's mapping table header
+ * @sb_seg_log_pages: full log size in sb segment (pages count)
+ * @segbmap_log_pages: full log size in segbmap segment (pages count)
+ * @maptbl_log_pages: full log size in maptbl segment (pages count)
+ * @lnodes_seg_log_pages: full log size in leaf nodes segment (pages count)
+ * @hnodes_seg_log_pages: full log size in hybrid nodes segment (pages count)
+ * @inodes_seg_log_pages: full log size in index nodes segment (pages count)
+ * @user_data_log_pages: full log size in user data segment (pages count)
+ * @create_threads_per_seg: number of creation threads per segment
+ * @dentries_btree: descriptor of all dentries btrees
+ * @extents_btree: descriptor of all extents btrees
+ * @xattr_btree: descriptor of all extended attributes btrees
+ * @invalidated_extents_btree: b-tree of invalidated extents (ZNS SSD)
+ */
+struct ssdfs_volume_header {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+	struct ssdfs_metadata_check check;
+
+/* 0x0010 */
+	__le8 log_pagesize;
+	__le8 log_erasesize;
+	__le8 log_segsize;
+	__le8 log_pebs_per_seg;
+	__le16 megabytes_per_peb;
+	__le16 pebs_per_seg;
+
+/* 0x0018 */
+	__le64 create_time;
+	__le64 create_cno;
+#define SSDFS_VH_ZNS_BASED_VOLUME	(1 << 0)
+#define SSDFS_VH_UNALIGNED_ZONE		(1 << 1)
+#define SSDFS_VH_FLAGS_MASK		(0x3)
+	__le32 flags;
+	__le32 lebs_per_peb_index;
+
+/* 0x0030 */
+#define VH_LIMIT1	SSDFS_SB_CHAIN_MAX
+#define VH_LIMIT2	SSDFS_SB_SEG_COPY_MAX
+	struct ssdfs_leb2peb_pair sb_pebs[VH_LIMIT1][VH_LIMIT2];
+
+/* 0x00B0 */
+	struct ssdfs_segbmap_sb_header segbmap;
+
+/* 0x0140 */
+	struct ssdfs_maptbl_sb_header maptbl;
+
+/* 0x01D0 */
+	__le16 sb_seg_log_pages;
+	__le16 segbmap_log_pages;
+	__le16 maptbl_log_pages;
+	__le16 lnodes_seg_log_pages;
+	__le16 hnodes_seg_log_pages;
+	__le16 inodes_seg_log_pages;
+	__le16 user_data_log_pages;
+	__le16 create_threads_per_seg;
+
+/* 0x01E0 */
+	struct ssdfs_dentries_btree_descriptor dentries_btree;
+
+/* 0x0200 */
+	struct ssdfs_extents_btree_descriptor extents_btree;
+
+/* 0x0220 */
+	struct ssdfs_xattr_btree_descriptor xattr_btree;
+
+/* 0x0240 */
+	struct ssdfs_invalidated_extents_btree invextree;
+
+/* 0x02C0 */
+	__le8 reserved4[0x140];
+
+/* 0x0400 */
+} __packed;
+
+#define SSDFS_LEBS_PER_PEB_INDEX_DEFAULT	(1)
+
+/*
+ * struct ssdfs_volume_state - changeable part of superblock
+ * @magic: magic signature + revision
+ * @check: metadata checksum
+ * @nsegs: segments count
+ * @free_pages: free pages count
+ * @timestamp: write timestamp
+ * @cno: write checkpoint
+ * @flags: volume flags
+ * @state: file system state
+ * @errors: behaviour when detecting errors
+ * @feature_compat: compatible feature set
+ * @feature_compat_ro: read-only compatible feature set
+ * @feature_incompat: incompatible feature set
+ * @uuid: 128-bit uuid for volume
+ * @label: volume name
+ * @cur_segs: array of current segment numbers
+ * @migration_threshold: default value of destination PEBs in migration
+ * @blkbmap: block bitmap options
+ * @blk2off_tbl: offset translation table options
+ * @user_data: user data options
+ * @open_zones: number of open/active zones
+ * @root_folder: copy of root folder's inode
+ * @inodes_btree: inodes btree root
+ * @shared_extents_btree: shared extents btree root
+ * @shared_dict_btree: shared dictionary btree root
+ * @snapshots_btree: snapshots btree root
+ */
+struct ssdfs_volume_state {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+	struct ssdfs_metadata_check check;
+
+/* 0x0010 */
+	__le64 nsegs;
+	__le64 free_pages;
+
+/* 0x0020 */
+	__le64 timestamp;
+	__le64 cno;
+
+/* 0x0030 */
+#define SSDFS_HAS_INLINE_INODES_TREE		(1 << 0)
+#define SSDFS_VOLUME_STATE_FLAGS_MASK		0x1
+	__le32 flags;
+	__le16 state;
+	__le16 errors;
+
+/* 0x0038 */
+	__le64 feature_compat;
+	__le64 feature_compat_ro;
+	__le64 feature_incompat;
+
+/* 0x0050 */
+	__le8 uuid[SSDFS_UUID_SIZE];
+	char label[SSDFS_VOLUME_LABEL_MAX];
+
+/* 0x0070 */
+	__le64 cur_segs[SSDFS_CUR_SEGS_COUNT];
+
+/* 0x0098 */
+	__le16 migration_threshold;
+	__le16 reserved1;
+
+/* 0x009C */
+	struct ssdfs_blk_bmap_options blkbmap;
+	struct ssdfs_blk2off_tbl_options blk2off_tbl;
+
+/* 0x00A4 */
+	struct ssdfs_user_data_options user_data;
+
+/* 0x00AC */
+	__le32 open_zones;
+
+/* 0x00B0 */
+	struct ssdfs_inode root_folder;
+
+/* 0x01B0 */
+	__le8 reserved3[0x50];
+
+/* 0x0200 */
+	struct ssdfs_inodes_btree inodes_btree;
+
+/* 0x0280 */
+	struct ssdfs_shared_extents_btree shared_extents_btree;
+
+/* 0x0300 */
+	struct ssdfs_shared_dictionary_btree shared_dict_btree;
+
+/* 0x0380 */
+	struct ssdfs_snapshots_btree snapshots_btree;
+
+/* 0x0400 */
+} __packed;
+
+/* Compatible feature flags */
+#define SSDFS_HAS_SEGBMAP_COMPAT_FLAG			(1 << 0)
+#define SSDFS_HAS_MAPTBL_COMPAT_FLAG			(1 << 1)
+#define SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG		(1 << 2)
+#define SSDFS_HAS_SHARED_XATTRS_COMPAT_FLAG		(1 << 3)
+#define SSDFS_HAS_SHARED_DICT_COMPAT_FLAG		(1 << 4)
+#define SSDFS_HAS_INODES_TREE_COMPAT_FLAG		(1 << 5)
+#define SSDFS_HAS_SNAPSHOTS_TREE_COMPAT_FLAG		(1 << 6)
+#define SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG	(1 << 7)
+
+/* Read-Only compatible feature flags */
+#define SSDFS_ZLIB_COMPAT_RO_FLAG	(1 << 0)
+#define SSDFS_LZO_COMPAT_RO_FLAG	(1 << 1)
+
+#define SSDFS_FEATURE_COMPAT_SUPP \
+	(SSDFS_HAS_SEGBMAP_COMPAT_FLAG | SSDFS_HAS_MAPTBL_COMPAT_FLAG | \
+	 SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG | \
+	 SSDFS_HAS_SHARED_XATTRS_COMPAT_FLAG | \
+	 SSDFS_HAS_SHARED_DICT_COMPAT_FLAG | \
+	 SSDFS_HAS_INODES_TREE_COMPAT_FLAG | \
+	 SSDFS_HAS_SNAPSHOTS_TREE_COMPAT_FLAG | \
+	 SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG)
+
+#define SSDFS_FEATURE_COMPAT_RO_SUPP \
+	(SSDFS_ZLIB_COMPAT_RO_FLAG | SSDFS_LZO_COMPAT_RO_FLAG)
+
+#define SSDFS_FEATURE_INCOMPAT_SUPP	0ULL
+
+/*
+ * struct ssdfs_metadata_descriptor - metadata descriptor
+ * @offset: offset in bytes
+ * @size: size in bytes
+ * @check: metadata checksum
+ */
+struct ssdfs_metadata_descriptor {
+/* 0x0000 */
+	__le32 offset;
+	__le32 size;
+	struct ssdfs_metadata_check check;
+
+/* 0x0010 */
+} __packed;
+
+enum {
+	SSDFS_BLK_BMAP_INDEX,
+	SSDFS_SNAPSHOT_RULES_AREA_INDEX,
+	SSDFS_OFF_TABLE_INDEX,
+	SSDFS_COLD_PAYLOAD_AREA_INDEX,
+	SSDFS_WARM_PAYLOAD_AREA_INDEX,
+	SSDFS_HOT_PAYLOAD_AREA_INDEX,
+	SSDFS_BLK_DESC_AREA_INDEX,
+	SSDFS_MAPTBL_CACHE_INDEX,
+	SSDFS_LOG_FOOTER_INDEX,
+	SSDFS_SEG_HDR_DESC_MAX = SSDFS_LOG_FOOTER_INDEX + 1,
+	SSDFS_LOG_FOOTER_DESC_MAX = SSDFS_OFF_TABLE_INDEX + 1,
+};
+
+enum {
+	SSDFS_PREV_MIGRATING_PEB,
+	SSDFS_CUR_MIGRATING_PEB,
+	SSDFS_MIGRATING_PEBS_CHAIN
+};
+
+/*
+ * struct ssdfs_segment_header - header of segment
+ * @volume_hdr: copy of static part of superblock
+ * @timestamp: log creation timestamp
+ * @cno: log checkpoint
+ * @log_pages: size of log (partial segment) in pages count
+ * @seg_type: type of segment
+ * @seg_flags: flags of segment
+ * @desc_array: array of segment's metadata descriptors
+ * @peb_migration_id: identification number of PEB in migration sequence
+ * @peb_create_time: PEB creation timestamp
+ * @payload: space for segment header's payload
+ */
+struct ssdfs_segment_header {
+/* 0x0000 */
+	struct ssdfs_volume_header volume_hdr;
+
+/* 0x0400 */
+	__le64 timestamp;
+	__le64 cno;
+
+/* 0x0410 */
+	__le16 log_pages;
+	__le16 seg_type;
+	__le32 seg_flags;
+
+/* 0x0418 */
+	struct ssdfs_metadata_descriptor desc_array[SSDFS_SEG_HDR_DESC_MAX];
+
+/* 0x04A8 */
+#define SSDFS_PEB_UNKNOWN_MIGRATION_ID		(0)
+#define SSDFS_PEB_MIGRATION_ID_START		(1)
+#define SSDFS_PEB_MIGRATION_ID_MAX		(U8_MAX)
+	__le8 peb_migration_id[SSDFS_MIGRATING_PEBS_CHAIN];
+
+/* 0x4AA */
+	__le64 peb_create_time;
+
+/* 0x4B2 */
+	__le8 payload[0x34E];
+
+/* 0x0800 */
+} __packed;
+
+/* Possible segment types */
+#define SSDFS_UNKNOWN_SEG_TYPE			(0)
+#define SSDFS_SB_SEG_TYPE			(1)
+#define SSDFS_INITIAL_SNAPSHOT_SEG_TYPE		(2)
+#define SSDFS_SEGBMAP_SEG_TYPE			(3)
+#define SSDFS_MAPTBL_SEG_TYPE			(4)
+#define SSDFS_LEAF_NODE_SEG_TYPE		(5)
+#define SSDFS_HYBRID_NODE_SEG_TYPE		(6)
+#define SSDFS_INDEX_NODE_SEG_TYPE		(7)
+#define SSDFS_USER_DATA_SEG_TYPE		(8)
+#define SSDFS_LAST_KNOWN_SEG_TYPE		SSDFS_USER_DATA_SEG_TYPE
+
+/* Segment flags' bits */
+#define SSDFS_BLK_BMAP_BIT			(0)
+#define SSDFS_OFFSET_TABLE_BIT			(1)
+#define SSDFS_COLD_PAYLOAD_BIT			(2)
+#define SSDFS_WARM_PAYLOAD_BIT			(3)
+#define SSDFS_HOT_PAYLOAD_BIT			(4)
+#define SSDFS_BLK_DESC_CHAIN_BIT		(5)
+#define SSDFS_MAPTBL_CACHE_BIT			(6)
+#define SSDFS_FOOTER_BIT			(7)
+#define SSDFS_PARTIAL_LOG_BIT			(8)
+#define SSDFS_PARTIAL_LOG_HEADER_BIT		(9)
+#define SSDFS_PLH_INSTEAD_FOOTER_BIT		(10)
+
+
+/* Segment flags */
+#define SSDFS_SEG_HDR_HAS_BLK_BMAP		(1 << SSDFS_BLK_BMAP_BIT)
+#define SSDFS_SEG_HDR_HAS_OFFSET_TABLE		(1 << SSDFS_OFFSET_TABLE_BIT)
+#define SSDFS_LOG_HAS_COLD_PAYLOAD		(1 << SSDFS_COLD_PAYLOAD_BIT)
+#define SSDFS_LOG_HAS_WARM_PAYLOAD		(1 << SSDFS_WARM_PAYLOAD_BIT)
+#define SSDFS_LOG_HAS_HOT_PAYLOAD		(1 << SSDFS_HOT_PAYLOAD_BIT)
+#define SSDFS_LOG_HAS_BLK_DESC_CHAIN		(1 << SSDFS_BLK_DESC_CHAIN_BIT)
+#define SSDFS_LOG_HAS_MAPTBL_CACHE		(1 << SSDFS_MAPTBL_CACHE_BIT)
+#define SSDFS_LOG_HAS_FOOTER			(1 << SSDFS_FOOTER_BIT)
+#define SSDFS_LOG_IS_PARTIAL			(1 << SSDFS_PARTIAL_LOG_BIT)
+#define SSDFS_LOG_HAS_PARTIAL_HEADER		(1 << SSDFS_PARTIAL_LOG_HEADER_BIT)
+#define SSDFS_PARTIAL_HEADER_INSTEAD_FOOTER	(1 << SSDFS_PLH_INSTEAD_FOOTER_BIT)
+#define SSDFS_SEG_HDR_FLAG_MASK			0x7FF
+
+/* Segment flags manipulation functions */
+#define SSDFS_SEG_HDR_FNS(bit, name)					\
+static inline void ssdfs_set_##name(struct ssdfs_segment_header *hdr)	\
+{									\
+	unsigned long seg_flags = le32_to_cpu(hdr->seg_flags);		\
+	set_bit(SSDFS_##bit, &seg_flags);				\
+	hdr->seg_flags = cpu_to_le32((u32)seg_flags);			\
+}									\
+static inline void ssdfs_clear_##name(struct ssdfs_segment_header *hdr)	\
+{									\
+	unsigned long seg_flags = le32_to_cpu(hdr->seg_flags);		\
+	clear_bit(SSDFS_##bit, &seg_flags);				\
+	hdr->seg_flags = cpu_to_le32((u32)seg_flags);			\
+}									\
+static inline int ssdfs_##name(struct ssdfs_segment_header *hdr)	\
+{									\
+	unsigned long seg_flags = le32_to_cpu(hdr->seg_flags);		\
+	return test_bit(SSDFS_##bit, &seg_flags);			\
+}
+
+/*
+ * ssdfs_set_seg_hdr_has_blk_bmap()
+ * ssdfs_clear_seg_hdr_has_blk_bmap()
+ * ssdfs_seg_hdr_has_blk_bmap()
+ */
+SSDFS_SEG_HDR_FNS(BLK_BMAP_BIT, seg_hdr_has_blk_bmap)
+
+/*
+ * ssdfs_set_seg_hdr_has_offset_table()
+ * ssdfs_clear_seg_hdr_has_offset_table()
+ * ssdfs_seg_hdr_has_offset_table()
+ */
+SSDFS_SEG_HDR_FNS(OFFSET_TABLE_BIT, seg_hdr_has_offset_table)
+
+/*
+ * ssdfs_set_log_has_cold_payload()
+ * ssdfs_clear_log_has_cold_payload()
+ * ssdfs_log_has_cold_payload()
+ */
+SSDFS_SEG_HDR_FNS(COLD_PAYLOAD_BIT, log_has_cold_payload)
+
+/*
+ * ssdfs_set_log_has_warm_payload()
+ * ssdfs_clear_log_has_warm_payload()
+ * ssdfs_log_has_warm_payload()
+ */
+SSDFS_SEG_HDR_FNS(WARM_PAYLOAD_BIT, log_has_warm_payload)
+
+/*
+ * ssdfs_set_log_has_hot_payload()
+ * ssdfs_clear_log_has_hot_payload()
+ * ssdfs_log_has_hot_payload()
+ */
+SSDFS_SEG_HDR_FNS(HOT_PAYLOAD_BIT, log_has_hot_payload)
+
+/*
+ * ssdfs_set_log_has_blk_desc_chain()
+ * ssdfs_clear_log_has_blk_desc_chain()
+ * ssdfs_log_has_blk_desc_chain()
+ */
+SSDFS_SEG_HDR_FNS(BLK_DESC_CHAIN_BIT, log_has_blk_desc_chain)
+
+/*
+ * ssdfs_set_log_has_maptbl_cache()
+ * ssdfs_clear_log_has_maptbl_cache()
+ * ssdfs_log_has_maptbl_cache()
+ */
+SSDFS_SEG_HDR_FNS(MAPTBL_CACHE_BIT, log_has_maptbl_cache)
+
+/*
+ * ssdfs_set_log_has_footer()
+ * ssdfs_clear_log_has_footer()
+ * ssdfs_log_has_footer()
+ */
+SSDFS_SEG_HDR_FNS(FOOTER_BIT, log_has_footer)
+
+/*
+ * ssdfs_set_log_is_partial()
+ * ssdfs_clear_log_is_partial()
+ * ssdfs_log_is_partial()
+ */
+SSDFS_SEG_HDR_FNS(PARTIAL_LOG_BIT, log_is_partial)
+
+/*
+ * ssdfs_set_log_has_partial_header()
+ * ssdfs_clear_log_has_partial_header()
+ * ssdfs_log_has_partial_header()
+ */
+SSDFS_SEG_HDR_FNS(PARTIAL_LOG_HEADER_BIT, log_has_partial_header)
+
+/*
+ * ssdfs_set_partial_header_instead_footer()
+ * ssdfs_clear_partial_header_instead_footer()
+ * ssdfs_partial_header_instead_footer()
+ */
+SSDFS_SEG_HDR_FNS(PLH_INSTEAD_FOOTER_BIT, partial_header_instead_footer)
+
+/*
+ * struct ssdfs_log_footer - footer of partial log
+ * @volume_state: changeable part of superblock
+ * @timestamp: writing timestamp
+ * @cno: writing checkpoint
+ * @log_bytes: payload size in bytes
+ * @log_flags: flags of log
+ * @reserved1: reserved field
+ * @desc_array: array of footer's metadata descriptors
+ * @peb_create_time: PEB creation timestamp
+ * @payload: space for log footer's payload
+ */
+struct ssdfs_log_footer {
+/* 0x0000 */
+	struct ssdfs_volume_state volume_state;
+
+/* 0x0400 */
+	__le64 timestamp;
+	__le64 cno;
+
+/* 0x0410 */
+	__le32 log_bytes;
+	__le32 log_flags;
+	__le64 reserved1;
+
+/* 0x0420 */
+	struct ssdfs_metadata_descriptor desc_array[SSDFS_LOG_FOOTER_DESC_MAX];
+
+/* 0x0450 */
+	__le64 peb_create_time;
+
+/* 0x0458 */
+	__le8 payload[0x3A8];
+
+/* 0x0800 */
+} __packed;
+
+/* Log footer flags' bits */
+#define __SSDFS_BLK_BMAP_BIT			(0)
+#define __SSDFS_OFFSET_TABLE_BIT		(1)
+#define __SSDFS_PARTIAL_LOG_BIT			(2)
+#define __SSDFS_ENDING_LOG_BIT			(3)
+#define __SSDFS_SNAPSHOT_RULE_AREA_BIT		(4)
+
+/* Log footer flags */
+#define SSDFS_LOG_FOOTER_HAS_BLK_BMAP		(1 << __SSDFS_BLK_BMAP_BIT)
+#define SSDFS_LOG_FOOTER_HAS_OFFSET_TABLE	(1 << __SSDFS_OFFSET_TABLE_BIT)
+#define SSDFS_PARTIAL_LOG_FOOTER		(1 << __SSDFS_PARTIAL_LOG_BIT)
+#define SSDFS_ENDING_LOG_FOOTER			(1 << __SSDFS_ENDING_LOG_BIT)
+#define SSDFS_LOG_FOOTER_HAS_SNAPSHOT_RULES	(1 << __SSDFS_SNAPSHOT_RULE_AREA_BIT)
+#define SSDFS_LOG_FOOTER_FLAG_MASK		0x1F
+
+/* Log footer flags manipulation functions */
+#define SSDFS_LOG_FOOTER_FNS(bit, name)					\
+static inline void ssdfs_set_##name(struct ssdfs_log_footer *footer)	\
+{									\
+	unsigned long log_flags = le32_to_cpu(footer->log_flags);	\
+	set_bit(__SSDFS_##bit, &log_flags);				\
+	footer->log_flags = cpu_to_le32((u32)log_flags);		\
+}									\
+static inline void ssdfs_clear_##name(struct ssdfs_log_footer *footer)	\
+{									\
+	unsigned long log_flags = le32_to_cpu(footer->log_flags);	\
+	clear_bit(__SSDFS_##bit, &log_flags);				\
+	footer->log_flags = cpu_to_le32((u32)log_flags);		\
+}									\
+static inline int ssdfs_##name(struct ssdfs_log_footer *footer)		\
+{									\
+	unsigned long log_flags = le32_to_cpu(footer->log_flags);	\
+	return test_bit(__SSDFS_##bit, &log_flags);			\
+}
+
+/*
+ * ssdfs_set_log_footer_has_blk_bmap()
+ * ssdfs_clear_log_footer_has_blk_bmap()
+ * ssdfs_log_footer_has_blk_bmap()
+ */
+SSDFS_LOG_FOOTER_FNS(BLK_BMAP_BIT, log_footer_has_blk_bmap)
+
+/*
+ * ssdfs_set_log_footer_has_offset_table()
+ * ssdfs_clear_log_footer_has_offset_table()
+ * ssdfs_log_footer_has_offset_table()
+ */
+SSDFS_LOG_FOOTER_FNS(OFFSET_TABLE_BIT, log_footer_has_offset_table)
+
+/*
+ * ssdfs_set_partial_log_footer()
+ * ssdfs_clear_partial_log_footer()
+ * ssdfs_partial_log_footer()
+ */
+SSDFS_LOG_FOOTER_FNS(PARTIAL_LOG_BIT, partial_log_footer)
+
+/*
+ * ssdfs_set_ending_log_footer()
+ * ssdfs_clear_ending_log_footer()
+ * ssdfs_ending_log_footer()
+ */
+SSDFS_LOG_FOOTER_FNS(ENDING_LOG_BIT, ending_log_footer)
+
+/*
+ * ssdfs_set_log_footer_has_snapshot_rules()
+ * ssdfs_clear_log_footer_has_snapshot_rules()
+ * ssdfs_log_footer_has_snapshot_rules()
+ */
+SSDFS_LOG_FOOTER_FNS(SNAPSHOT_RULE_AREA_BIT, log_footer_has_snapshot_rules)
+
+/*
+ * struct ssdfs_partial_log_header - header of partial log
+ * @magic: magic signature + revision
+ * @check: metadata checksum
+ * @timestamp: writing timestamp
+ * @cno: writing checkpoint
+ * @log_pages: size of log in pages count
+ * @seg_type: type of segment
+ * @pl_flags: flags of log
+ * @log_bytes: payload size in bytes
+ * @flags: volume flags
+ * @desc_array: array of log's metadata descriptors
+ * @nsegs: segments count
+ * @free_pages: free pages count
+ * @root_folder: copy of root folder's inode
+ * @inodes_btree: inodes btree root
+ * @shared_extents_btree: shared extents btree root
+ * @shared_dict_btree: shared dictionary btree root
+ * @sequence_id: index of partial log in the sequence
+ * @log_pagesize: log2(page size)
+ * @log_erasesize: log2(erase block size)
+ * @log_segsize: log2(segment size)
+ * @log_pebs_per_seg: log2(erase blocks per segment)
+ * @lebs_per_peb_index: difference of LEB IDs between PEB indexes in segment
+ * @create_threads_per_seg: number of creation threads per segment
+ * @snapshots_btree: snapshots btree root
+ * @open_zones: number of open/active zones
+ * @peb_create_time: PEB creation timestamp
+ * @invextree: invalidated extents btree root
+ *
+ * This header is used when the full log needs to be built from several
+ * partial logs. The header represents the combination of the most
+ * essential fields of segment header and log footer. The first partial
+ * log starts from the segment header and partial log header. The next
+ * every partial log starts from the partial log header. Only the latest
+ * log ends with the log footer.
+ */
+struct ssdfs_partial_log_header {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+	struct ssdfs_metadata_check check;
+
+/* 0x0010 */
+	__le64 timestamp;
+	__le64 cno;
+
+/* 0x0020 */
+	__le16 log_pages;
+	__le16 seg_type;
+	__le32 pl_flags;
+
+/* 0x0028 */
+	__le32 log_bytes;
+	__le32 flags;
+
+/* 0x0030 */
+	struct ssdfs_metadata_descriptor desc_array[SSDFS_SEG_HDR_DESC_MAX];
+
+/* 0x00C0 */
+	__le64 nsegs;
+	__le64 free_pages;
+
+/* 0x00D0 */
+	struct ssdfs_inode root_folder;
+
+/* 0x01D0 */
+	struct ssdfs_inodes_btree inodes_btree;
+
+/* 0x0250 */
+	struct ssdfs_shared_extents_btree shared_extents_btree;
+
+/* 0x02D0 */
+	struct ssdfs_shared_dictionary_btree shared_dict_btree;
+
+/* 0x0350 */
+	__le32 sequence_id;
+	__le8 log_pagesize;
+	__le8 log_erasesize;
+	__le8 log_segsize;
+	__le8 log_pebs_per_seg;
+	__le32 lebs_per_peb_index;
+	__le16 create_threads_per_seg;
+	__le8 reserved1[0x2];
+
+/* 0x0360 */
+	struct ssdfs_snapshots_btree snapshots_btree;
+
+/* 0x03E0 */
+	__le32 open_zones;
+	__le8 reserved2[0x4];
+	__le64 peb_create_time;
+	__le8 reserved3[0x10];
+
+/* 0x0400 */
+	struct ssdfs_invalidated_extents_btree invextree;
+
+/* 0x0480 */
+	__le8 payload[0x380];
+
+/* 0x0800 */
+} __packed;
+
+/* Partial log flags manipulation functions */
+#define SSDFS_PL_HDR_FNS(bit, name)					 \
+static inline void ssdfs_set_##name(struct ssdfs_partial_log_header *hdr) \
+{									 \
+	unsigned long pl_flags = le32_to_cpu(hdr->pl_flags);		 \
+	set_bit(SSDFS_##bit, &pl_flags);				 \
+	hdr->pl_flags = cpu_to_le32((u32)pl_flags);			 \
+}									 \
+static inline void ssdfs_clear_##name(struct ssdfs_partial_log_header *hdr) \
+{									 \
+	unsigned long pl_flags = le32_to_cpu(hdr->pl_flags);		 \
+	clear_bit(SSDFS_##bit, &pl_flags);				 \
+	hdr->pl_flags = cpu_to_le32((u32)pl_flags);			 \
+}									 \
+static inline int ssdfs_##name(struct ssdfs_partial_log_header *hdr)	 \
+{									 \
+	unsigned long pl_flags = le32_to_cpu(hdr->pl_flags);		 \
+	return test_bit(SSDFS_##bit, &pl_flags);			 \
+}
+
+/*
+ * ssdfs_set_pl_hdr_has_blk_bmap()
+ * ssdfs_clear_pl_hdr_has_blk_bmap()
+ * ssdfs_pl_hdr_has_blk_bmap()
+ */
+SSDFS_PL_HDR_FNS(BLK_BMAP_BIT, pl_hdr_has_blk_bmap)
+
+/*
+ * ssdfs_set_pl_hdr_has_offset_table()
+ * ssdfs_clear_pl_hdr_has_offset_table()
+ * ssdfs_pl_hdr_has_offset_table()
+ */
+SSDFS_PL_HDR_FNS(OFFSET_TABLE_BIT, pl_hdr_has_offset_table)
+
+/*
+ * ssdfs_set_pl_has_cold_payload()
+ * ssdfs_clear_pl_has_cold_payload()
+ * ssdfs_pl_has_cold_payload()
+ */
+SSDFS_PL_HDR_FNS(COLD_PAYLOAD_BIT, pl_has_cold_payload)
+
+/*
+ * ssdfs_set_pl_has_warm_payload()
+ * ssdfs_clear_pl_has_warm_payload()
+ * ssdfs_pl_has_warm_payload()
+ */
+SSDFS_PL_HDR_FNS(WARM_PAYLOAD_BIT, pl_has_warm_payload)
+
+/*
+ * ssdfs_set_pl_has_hot_payload()
+ * ssdfs_clear_pl_has_hot_payload()
+ * ssdfs_pl_has_hot_payload()
+ */
+SSDFS_PL_HDR_FNS(HOT_PAYLOAD_BIT, pl_has_hot_payload)
+
+/*
+ * ssdfs_set_pl_has_blk_desc_chain()
+ * ssdfs_clear_pl_has_blk_desc_chain()
+ * ssdfs_pl_has_blk_desc_chain()
+ */
+SSDFS_PL_HDR_FNS(BLK_DESC_CHAIN_BIT, pl_has_blk_desc_chain)
+
+/*
+ * ssdfs_set_pl_has_maptbl_cache()
+ * ssdfs_clear_pl_has_maptbl_cache()
+ * ssdfs_pl_has_maptbl_cache()
+ */
+SSDFS_PL_HDR_FNS(MAPTBL_CACHE_BIT, pl_has_maptbl_cache)
+
+/*
+ * ssdfs_set_pl_has_footer()
+ * ssdfs_clear_pl_has_footer()
+ * ssdfs_pl_has_footer()
+ */
+SSDFS_PL_HDR_FNS(FOOTER_BIT, pl_has_footer)
+
+/*
+ * ssdfs_set_pl_is_partial()
+ * ssdfs_clear_pl_is_partial()
+ * ssdfs_pl_is_partial()
+ */
+SSDFS_PL_HDR_FNS(PARTIAL_LOG_BIT, pl_is_partial)
+
+/*
+ * ssdfs_set_pl_has_partial_header()
+ * ssdfs_clear_pl_has_partial_header()
+ * ssdfs_pl_has_partial_header()
+ */
+SSDFS_PL_HDR_FNS(PARTIAL_LOG_HEADER_BIT, pl_has_partial_header)
+
+/*
+ * ssdfs_set_pl_header_instead_footer()
+ * ssdfs_clear_pl_header_instead_footer()
+ * ssdfs_pl_header_instead_footer()
+ */
+SSDFS_PL_HDR_FNS(PLH_INSTEAD_FOOTER_BIT, pl_header_instead_footer)
+
+/*
+ * struct ssdfs_diff_blob_header - diff blob header
+ * @magic: diff blob's magic
+ * @type: diff blob's type
+ * @desc_size: size of diff blob's descriptor in bytes
+ * @blob_size: size of diff blob in bytes
+ * @flags: diff blob's flags
+ */
+struct ssdfs_diff_blob_header {
+/* 0x0000 */
+	__le16 magic;
+	__le8 type;
+	__le8 desc_size;
+	__le16 blob_size;
+	__le16 flags;
+
+/* 0x0008 */
+} __packed;
+
+/* Diff blob flags */
+#define SSDFS_DIFF_BLOB_HAS_BTREE_NODE_HEADER	(1 << 0)
+#define SSDFS_DIFF_CHAIN_CONTAINS_NEXT_BLOB	(1 << 1)
+#define SSDFS_DIFF_BLOB_FLAGS_MASK		(0x3)
+
+/*
+ * struct ssdfs_metadata_diff_blob_header - metadata diff blob header
+ * @diff: generic diff blob header
+ * @bits_count: count of bits in bitmap
+ * @item_start_bit: item starting bit in bitmap
+ * @index_start_bit: index starting bit in bitmap
+ * @item_size: size of item in bytes
+ */
+struct ssdfs_metadata_diff_blob_header {
+/* 0x0000 */
+	struct ssdfs_diff_blob_header diff;
+
+/* 0x0008 */
+	__le16 bits_count;
+	__le16 item_start_bit;
+	__le16 index_start_bit;
+	__le16 item_size;
+
+/* 0x0010 */
+} __packed;
+
+/* Diff blob types */
+enum {
+	SSDFS_UNKNOWN_DIFF_BLOB_TYPE,
+	SSDFS_BTREE_NODE_DIFF_BLOB,
+	SSDFS_USER_DATA_DIFF_BLOB,
+	SSDFS_DIFF_BLOB_TYPE_MAX
+};
+
+/*
+ * struct ssdfs_fragments_chain_header - header of fragments' chain
+ * @compr_bytes: size of the whole fragments' chain in compressed state
+ * @uncompr_bytes: size of the whole fragments' chain in decompressed state
+ * @fragments_count: count of fragments in the chain
+ * @desc_size: size of one descriptor item
+ * @magic: fragments chain header magic
+ * @type: fragments chain header type
+ * @flags: flags of fragments' chain
+ */
+struct ssdfs_fragments_chain_header {
+/* 0x0000 */
+	__le32 compr_bytes;
+	__le32 uncompr_bytes;
+
+/* 0x0008 */
+	__le16 fragments_count;
+	__le16 desc_size;
+
+/* 0x000C */
+	__le8 magic;
+	__le8 type;
+	__le16 flags;
+
+/* 0x0010 */
+} __packed;
+
+/* Fragments chain types */
+#define SSDFS_UNKNOWN_CHAIN_HDR		0x0
+#define SSDFS_LOG_AREA_CHAIN_HDR	0x1
+#define SSDFS_BLK_STATE_CHAIN_HDR	0x2
+#define SSDFS_BLK_DESC_CHAIN_HDR	0x3
+#define SSDFS_BLK_DESC_ZLIB_CHAIN_HDR	0x4
+#define SSDFS_BLK_DESC_LZO_CHAIN_HDR	0x5
+#define SSDFS_BLK_BMAP_CHAIN_HDR	0x6
+#define SSDFS_CHAIN_HDR_TYPE_MAX	(SSDFS_BLK_BMAP_CHAIN_HDR + 1)
+
+/* Fragments chain flags */
+#define SSDFS_MULTIPLE_HDR_CHAIN	(1 << 0)
+#define SSDFS_CHAIN_HDR_FLAG_MASK	0x1
+
+/* Fragments chain constants */
+#define SSDFS_FRAGMENTS_CHAIN_MAX		14
+#define SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX	64
+
+/*
+ * struct ssdfs_fragment_desc - fragment descriptor
+ * @offset: fragment's offset
+ * @compr_size: size of fragment in compressed state
+ * @uncompr_size: size of fragment after decompression
+ * @checksum: fragment checksum
+ * @sequence_id: fragment's sequential id number
+ * @magic: fragment descriptor's magic
+ * @type: fragment descriptor's type
+ * @flags: fragment descriptor's flags
+ */
+struct ssdfs_fragment_desc {
+/* 0x0000 */
+	__le32 offset;
+	__le16 compr_size;
+	__le16 uncompr_size;
+
+/* 0x0008 */
+	__le32 checksum;
+	__le8 sequence_id;
+	__le8 magic;
+	__le8 type;
+	__le8 flags;
+
+/* 0x0010 */
+} __packed;
+
+/* Fragment descriptor types */
+#define SSDFS_UNKNOWN_FRAGMENT_TYPE	0
+#define SSDFS_FRAGMENT_UNCOMPR_BLOB	1
+#define SSDFS_FRAGMENT_ZLIB_BLOB	2
+#define SSDFS_FRAGMENT_LZO_BLOB		3
+#define SSDFS_DATA_BLK_STATE_DESC	4
+#define SSDFS_DATA_BLK_DESC		5
+#define SSDFS_DATA_BLK_DESC_ZLIB	6
+#define SSDFS_DATA_BLK_DESC_LZO		7
+#define SSDFS_NEXT_TABLE_DESC		8
+#define SSDFS_FRAGMENT_DESC_MAX_TYPE	(SSDFS_NEXT_TABLE_DESC + 1)
+
+/* Fragment descriptor flags */
+#define SSDFS_FRAGMENT_HAS_CSUM		(1 << 0)
+#define SSDFS_FRAGMENT_DESC_FLAGS_MASK	0x1
+
+/*
+ * struct ssdfs_block_bitmap_header - header of segment's block bitmap
+ * @magic: magic signature and flags
+ * @fragments_count: count of block bitmap's fragments
+ * @bytes_count: count of bytes in fragments' sequence
+ * @flags: block bitmap's flags
+ * @type: type of block bitmap
+ */
+struct ssdfs_block_bitmap_header {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+	__le16 fragments_count;
+	__le32 bytes_count;
+
+#define SSDFS_BLK_BMAP_BACKUP		(1 << 0)
+#define SSDFS_BLK_BMAP_COMPRESSED	(1 << 1)
+#define SSDFS_BLK_BMAP_FLAG_MASK	0x3
+	__le8 flags;
+
+#define SSDFS_BLK_BMAP_UNCOMPRESSED_BLOB	(0)
+#define SSDFS_BLK_BMAP_ZLIB_BLOB		(1)
+#define SSDFS_BLK_BMAP_LZO_BLOB			(2)
+#define SSDFS_BLK_BMAP_TYPE_MAX			(SSDFS_BLK_BMAP_LZO_BLOB + 1)
+	__le8 type;
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_block_bitmap_fragment - block bitmap's fragment header
+ * @peb_index: PEB's index
+ * @sequence_id: ID of block bitmap's fragment in the sequence
+ * @flags: fragment's flags
+ * @type: fragment type
+ * @last_free_blk: last logical free block
+ * @metadata_blks: count of physical pages are used by metadata
+ * @invalid_blks: count of invalid blocks
+ * @chain_hdr: descriptor of block bitmap's fragments' chain
+ */
+struct ssdfs_block_bitmap_fragment {
+/* 0x0000 */
+	__le16 peb_index;
+	__le8 sequence_id;
+
+#define SSDFS_MIGRATING_BLK_BMAP	(1 << 0)
+#define SSDFS_PEB_HAS_EXT_PTR		(1 << 1)
+#define SSDFS_PEB_HAS_RELATION		(1 << 2)
+#define SSDFS_FRAG_BLK_BMAP_FLAG_MASK	0x7
+	__le8 flags : 6;
+
+#define SSDFS_SRC_BLK_BMAP		(0)
+#define SSDFS_DST_BLK_BMAP		(1)
+#define SSDFS_FRAG_BLK_BMAP_TYPE_MAX	(SSDFS_DST_BLK_BMAP + 1)
+	__le8 type : 2;
+
+	__le32 last_free_blk;
+
+/* 0x0008 */
+	__le32 metadata_blks;
+	__le32 invalid_blks;
+
+/* 0x0010 */
+	struct ssdfs_fragments_chain_header chain_hdr;
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * The block to offset table has structure:
+ *
+ * ----------------------------
+ * |                          |
+ * |  Blk2Off table Header    |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |   Translation extents    |
+ * |        sequence          |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |  Physical offsets table  |
+ * |         header           |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |    Physical offset       |
+ * |  descriptors sequence    |
+ * |                          |
+ * ----------------------------
+ */
+
+/* Possible log's area types */
+enum {
+	SSDFS_LOG_BLK_DESC_AREA,
+	SSDFS_LOG_MAIN_AREA,
+	SSDFS_LOG_DIFFS_AREA,
+	SSDFS_LOG_JOURNAL_AREA,
+	SSDFS_LOG_AREA_MAX,
+};
+
+/*
+ * struct ssdfs_peb_page_descriptor - PEB's page descriptor
+ * @logical_offset: logical offset from file's begin in pages
+ * @logical_blk: logical number of the block in segment
+ * @peb_page: PEB's page index
+ */
+struct ssdfs_peb_page_descriptor {
+/* 0x0000 */
+	__le32 logical_offset;
+	__le16 logical_blk;
+	__le16 peb_page;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_blk_state_offset - block's state offset
+ * @log_start_page: start page of the log
+ * @log_area: identification number of log area
+ * @peb_migration_id: identification number of PEB in migration sequence
+ * @byte_offset: offset in bytes from area's beginning
+ */
+struct ssdfs_blk_state_offset {
+/* 0x0000 */
+	__le16 log_start_page;
+	__le8 log_area;
+	__le8 peb_migration_id;
+	__le32 byte_offset;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_phys_offset_descriptor - descriptor of physical offset
+ * @page_desc: PEB's page descriptor
+ * @blk_state: logical block's state offset
+ */
+struct ssdfs_phys_offset_descriptor {
+/* 0x0000 */
+	struct ssdfs_peb_page_descriptor page_desc;
+	struct ssdfs_blk_state_offset blk_state;
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_phys_offset_table_header - physical offset table header
+ * @start_id: start id in the table's fragment
+ * @id_count: number of unique physical offsets in log's fragments chain
+ * @byte_size: size in bytes of table's fragment
+ * @peb_index: PEB index
+ * @sequence_id: table's fragment's sequential id number
+ * @type: table's type
+ * @flags: table's flags
+ * @magic: table's magic
+ * @checksum: table checksum
+ * @used_logical_blks: count of allocated logical blocks
+ * @free_logical_blks: count of free logical blocks
+ * @last_allocated_blk: last allocated block (hint for allocation)
+ * @next_fragment_off: offset till next table's fragment
+ *
+ * This table contains offsets of block descriptors in a segment.
+ * Generally speaking, table can be represented as array of
+ * ssdfs_phys_offset_descriptor structures are ordered by id
+ * numbers. The whole table can be split on several fragments.
+ * Every table's fragment begins from header.
+ */
+struct ssdfs_phys_offset_table_header {
+/* 0x0000 */
+	__le16 start_id;
+	__le16 id_count;
+	__le32 byte_size;
+
+/* 0x0008 */
+	__le16 peb_index;
+	__le16 sequence_id;
+	__le16 type;
+	__le16 flags;
+
+/* 0x0010 */
+	__le32 magic;
+	__le32 checksum;
+
+/* 0x0018 */
+	__le16 used_logical_blks;
+	__le16 free_logical_blks;
+	__le16 last_allocated_blk;
+	__le16 next_fragment_off;
+
+/* 0x0020 */
+} __packed;
+
+/* Physical offset table types */
+#define SSDFS_UNKNOWN_OFF_TABLE_TYPE	0
+#define SSDFS_SEG_OFF_TABLE		1
+#define SSDFS_OFF_TABLE_MAX_TYPE	(SSDFS_SEG_OFF_TABLE + 1)
+
+/* Physical offset table flags */
+#define SSDFS_OFF_TABLE_HAS_CSUM		(1 << 0)
+#define SSDFS_OFF_TABLE_HAS_NEXT_FRAGMENT	(1 << 1)
+#define SSDFS_BLK_DESC_TBL_COMPRESSED		(1 << 2)
+#define SSDFS_OFF_TABLE_FLAGS_MASK		0x7
+
+/*
+ * struct ssdfs_translation_extent - logical block to offset id translation
+ * @logical_blk: starting logical block
+ * @offset_id: starting offset id
+ * @len: count of items in extent
+ * @sequence_id: id in sequence of extents
+ * @state: logical blocks' sequence state
+ */
+struct ssdfs_translation_extent {
+/* 0x0000 */
+	__le16 logical_blk;
+#define SSDFS_INVALID_OFFSET_ID		(U16_MAX)
+	__le16 offset_id;
+	__le16 len;
+	__le8 sequence_id;
+	__le8 state;
+
+/* 0x0008 */
+} __packed;
+
+enum {
+	SSDFS_LOGICAL_BLK_UNKNOWN_STATE,
+	SSDFS_LOGICAL_BLK_FREE,
+	SSDFS_LOGICAL_BLK_USED,
+	SSDFS_LOGICAL_BLK_STATE_MAX,
+};
+
+/*
+ * struct ssdfs_blk2off_table_header - translation table header
+ * @magic: magic signature
+ * @check: metadata checksum + flags
+ * @extents_off: offset in bytes from header begin till extents sequence
+ * @extents_count: count of extents in the sequence
+ * @offset_table_off: offset in bytes from header begin till phys offsets table
+ * @fragments_count: count of table's fragments for the whole PEB
+ * @sequence: first translation extent in the sequence
+ */
+struct ssdfs_blk2off_table_header {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+#define SSDFS_BLK2OFF_TBL_ZLIB_COMPR	(1 << 1)
+#define SSDFS_BLK2OFF_TBL_LZO_COMPR	(1 << 2)
+	struct ssdfs_metadata_check check;
+
+/* 0x0010 */
+	__le16 extents_off;
+	__le16 extents_count;
+	__le16 offset_table_off;
+	__le16 fragments_count;
+
+/* 0x0018 */
+	struct ssdfs_translation_extent sequence[1];
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * The block's descriptor table has structure:
+ *
+ * ----------------------------
+ * |                          |
+ * | Area block table #0      |
+ * |  Fragment descriptor #0  |
+ * |          ***             |
+ * |  Fragment descriptor #14 |
+ * |  Next area block table   |
+ * |        descriptor        |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |    Block descriptor #0   |
+ * |           ***            |
+ * |    Block descriptor #N   |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |          ***             |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |    Block descriptor #0   |
+ * |           ***            |
+ * |    Block descriptor #N   |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |          ***             |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * | Area block table #N      |
+ * |  Fragment descriptor #0  |
+ * |          ***             |
+ * |  Fragment descriptor #14 |
+ * |  Next area block table   |
+ * |        descriptor        |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |    Block descriptor #0   |
+ * |           ***            |
+ * |    Block descriptor #N   |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |          ***             |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |    Block descriptor #0   |
+ * |           ***            |
+ * |    Block descriptor #N   |
+ * |                          |
+ * ----------------------------
+ */
+
+#define SSDFS_BLK_STATE_OFF_MAX		6
+
+/*
+ * struct ssdfs_block_descriptor - block descriptor
+ * @ino: inode identification number
+ * @logical_offset: logical offset from file's begin in pages
+ * @peb_index: PEB's index
+ * @peb_page: PEB's page index
+ * @state: array of fragment's offsets
+ */
+struct ssdfs_block_descriptor {
+/* 0x0000 */
+	__le64 ino;
+	__le32 logical_offset;
+	__le16 peb_index;
+	__le16 peb_page;
+
+/* 0x0010 */
+	struct ssdfs_blk_state_offset state[SSDFS_BLK_STATE_OFF_MAX];
+
+/* 0x0040 */
+} __packed;
+
+/*
+ * struct ssdfs_area_block_table - descriptor of block state sequence in area
+ * @chain_hdr: descriptor of block states' chain
+ * @blk: table of fragment descriptors
+ *
+ * This table describes block state sequence in PEB's area. This table
+ * can consists from several parts. Every part can describe 14 blocks
+ * in partial sequence. If sequence contains more block descriptors
+ * then last fragment descriptor describes placement of next part of
+ * block table and so on.
+ */
+struct ssdfs_area_block_table {
+/* 0x0000 */
+	struct ssdfs_fragments_chain_header chain_hdr;
+
+/* 0x0010 */
+#define SSDFS_NEXT_BLK_TABLE_INDEX	SSDFS_FRAGMENTS_CHAIN_MAX
+#define SSDFS_BLK_TABLE_MAX		(SSDFS_FRAGMENTS_CHAIN_MAX + 1)
+	struct ssdfs_fragment_desc blk[SSDFS_BLK_TABLE_MAX];
+
+/* 0x0100 */
+} __packed;
+
+/*
+ * The data (diff, journaling) area has structure:
+ * -----------------------------
+ * |                           |
+ * | Block state descriptor #0 |
+ * |  Fragment descriptor #0   |
+ * |          ***              |
+ * |  Fragment descriptor #N   |
+ * |                           |
+ * -----------------------------
+ * |                           |
+ * |   Data portion #0         |
+ * |          ***              |
+ * |   Data portion #N         |
+ * |                           |
+ * -----------------------------
+ * |                           |
+ * |          ***              |
+ * |                           |
+ * -----------------------------
+ * |                           |
+ * | Block state descriptor #N |
+ * |  Fragment descriptor #0   |
+ * |          ***              |
+ * |  Fragment descriptor #N   |
+ * |                           |
+ * -----------------------------
+ * |                           |
+ * |   Data portion #0         |
+ * |          ***              |
+ * |   Data portion #N         |
+ * |                           |
+ * -----------------------------
+ */
+
+/*
+ * ssdfs_block_state_descriptor - block's state descriptor
+ * @cno: checkpoint
+ * @parent_snapshot: parent snapshot
+ * @chain_hdr: descriptor of data fragments' chain
+ */
+struct ssdfs_block_state_descriptor {
+/* 0x0000 */
+	__le64 cno;
+	__le64 parent_snapshot;
+
+/* 0x0010 */
+	struct ssdfs_fragments_chain_header chain_hdr;
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_segbmap_fragment_header - segment bitmap fragment header
+ * @magic: magic signature
+ * @seg_index: segment index in segment bitmap fragments' chain
+ * @peb_index: PEB's index in segment
+ * @flags: fragment's flags
+ * @seg_type: segment type (main/backup)
+ * @start_item: fragment's start item number
+ * @sequence_id: fragment identification number
+ * @fragment_bytes: bytes count in fragment
+ * @checksum: fragment checksum
+ * @total_segs: count of total segments in fragment
+ * @clean_or_using_segs: count of clean or using segments in fragment
+ * @used_or_dirty_segs: count of used or dirty segments in fragment
+ * @bad_segs: count of bad segments in fragment
+ */
+struct ssdfs_segbmap_fragment_header {
+/* 0x0000 */
+	__le16 magic;
+	__le16 seg_index;
+	__le16 peb_index;
+#define SSDFS_SEGBMAP_FRAG_ZLIB_COMPR	(1 << 0)
+#define SSDFS_SEGBMAP_FRAG_LZO_COMPR	(1 << 1)
+	__le8 flags;
+	__le8 seg_type;
+
+/* 0x0008 */
+	__le64 start_item;
+
+/* 0x0010 */
+	__le16 sequence_id;
+	__le16 fragment_bytes;
+	__le32 checksum;
+
+/* 0x0018 */
+	__le16 total_segs;
+	__le16 clean_or_using_segs;
+	__le16 used_or_dirty_segs;
+	__le16 bad_segs;
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_peb_descriptor - descriptor of PEB
+ * @erase_cycles: count of P/E cycles of PEB
+ * @type: PEB's type
+ * @state: PEB's state
+ * @flags: PEB's flags
+ * @shared_peb_index: index of external shared destination PEB
+ */
+struct ssdfs_peb_descriptor {
+/* 0x0000 */
+	__le32 erase_cycles;
+	__le8 type;
+	__le8 state;
+	__le8 flags;
+	__le8 shared_peb_index;
+
+/* 0x0008 */
+} __packed;
+
+/* PEB's types */
+enum {
+	SSDFS_MAPTBL_UNKNOWN_PEB_TYPE,
+	SSDFS_MAPTBL_DATA_PEB_TYPE,
+	SSDFS_MAPTBL_LNODE_PEB_TYPE,
+	SSDFS_MAPTBL_HNODE_PEB_TYPE,
+	SSDFS_MAPTBL_IDXNODE_PEB_TYPE,
+	SSDFS_MAPTBL_INIT_SNAP_PEB_TYPE,
+	SSDFS_MAPTBL_SBSEG_PEB_TYPE,
+	SSDFS_MAPTBL_SEGBMAP_PEB_TYPE,
+	SSDFS_MAPTBL_MAPTBL_PEB_TYPE,
+	SSDFS_MAPTBL_PEB_TYPE_MAX
+};
+
+/* PEB's states */
+enum {
+	SSDFS_MAPTBL_UNKNOWN_PEB_STATE,
+	SSDFS_MAPTBL_BAD_PEB_STATE,
+	SSDFS_MAPTBL_CLEAN_PEB_STATE,
+	SSDFS_MAPTBL_USING_PEB_STATE,
+	SSDFS_MAPTBL_USED_PEB_STATE,
+	SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE,
+	SSDFS_MAPTBL_DIRTY_PEB_STATE,
+	SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE,
+	SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE,
+	SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE,
+	SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE,
+	SSDFS_MAPTBL_MIGRATION_DST_USING_STATE,
+	SSDFS_MAPTBL_MIGRATION_DST_USED_STATE,
+	SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE,
+	SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE,
+	SSDFS_MAPTBL_PRE_ERASE_STATE,
+	SSDFS_MAPTBL_UNDER_ERASE_STATE,
+	SSDFS_MAPTBL_SNAPSHOT_STATE,
+	SSDFS_MAPTBL_RECOVERING_STATE,
+	SSDFS_MAPTBL_PEB_STATE_MAX
+};
+
+/* PEB's flags */
+#define SSDFS_MAPTBL_SHARED_DESTINATION_PEB		(1 << 0)
+#define SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR		(1 << 1)
+#define SSDFS_MAPTBL_SOURCE_PEB_HAS_ZONE_PTR		(1 << 2)
+
+#define SSDFS_PEBTBL_BMAP_SIZE \
+	((PAGE_SIZE / sizeof(struct ssdfs_peb_descriptor)) / \
+	 BITS_PER_BYTE)
+
+/* PEB table's bitmap types */
+enum {
+	SSDFS_PEBTBL_USED_BMAP,
+	SSDFS_PEBTBL_DIRTY_BMAP,
+	SSDFS_PEBTBL_RECOVER_BMAP,
+	SSDFS_PEBTBL_BADBLK_BMAP,
+	SSDFS_PEBTBL_BMAP_MAX
+};
+
+/*
+ * struct ssdfs_peb_table_fragment_header - header of PEB table fragment
+ * @magic: signature of PEB table's fragment
+ * @flags: flags of PEB table's fragment
+ * @recover_months: recovering duration in months
+ * @recover_threshold: recover threshold
+ * @checksum: checksum of PEB table's fragment
+ * @start_peb: starting PEB number
+ * @pebs_count: count of PEB's descriptors in table's fragment
+ * @last_selected_peb: index of last selected unused PEB
+ * @reserved_pebs: count of reserved PEBs in table's fragment
+ * @stripe_id: stripe identification number
+ * @portion_id: sequential ID of mapping table fragment
+ * @fragment_id: sequential ID of PEB table fragment in the portion
+ * @bytes_count: table's fragment size in bytes
+ * @bmap: PEB table fragment's bitmap
+ */
+struct ssdfs_peb_table_fragment_header {
+/* 0x0000 */
+	__le16 magic;
+	__le8 flags;
+	__le8 recover_months : 4;
+	__le8 recover_threshold : 4;
+	__le32 checksum;
+
+/* 0x0008 */
+	__le64 start_peb;
+
+/* 0x0010 */
+	__le16 pebs_count;
+	__le16 last_selected_peb;
+	__le16 reserved_pebs;
+	__le16 stripe_id;
+
+/* 0x0018 */
+	__le16 portion_id;
+	__le16 fragment_id;
+	__le32 bytes_count;
+
+/* 0x0020 */
+	__le8 bmaps[SSDFS_PEBTBL_BMAP_MAX][SSDFS_PEBTBL_BMAP_SIZE];
+
+/* 0x0120 */
+} __packed;
+
+/* PEB table fragment's flags */
+#define SSDFS_PEBTBL_FRAG_ZLIB_COMPR		(1 << 0)
+#define SSDFS_PEBTBL_FRAG_LZO_COMPR		(1 << 1)
+#define SSDFS_PEBTBL_UNDER_RECOVERING		(1 << 2)
+#define SSDFS_PEBTBL_BADBLK_EXIST		(1 << 3)
+#define SSDFS_PEBTBL_TRY_CORRECT_PEBS_AGAIN	(1 << 4)
+#define SSDFS_PEBTBL_FIND_RECOVERING_PEBS \
+	(SSDFS_PEBTBL_UNDER_RECOVERING | SSDFS_PEBTBL_BADBLK_EXIST)
+#define SSDFS_PEBTBL_FLAGS_MASK			0x1F
+
+/* PEB table recover thresholds */
+#define SSDFS_PEBTBL_FIRST_RECOVER_TRY		(0)
+#define SSDFS_PEBTBL_SECOND_RECOVER_TRY		(1)
+#define SSDFS_PEBTBL_THIRD_RECOVER_TRY		(2)
+#define SSDFS_PEBTBL_FOURTH_RECOVER_TRY		(3)
+#define SSDFS_PEBTBL_FIFTH_RECOVER_TRY		(4)
+#define SSDFS_PEBTBL_SIX_RECOVER_TRY		(5)
+#define SSDFS_PEBTBL_BADBLK_THRESHOLD		(6)
+
+#define SSDFS_PEBTBL_FRAGMENT_HDR_SIZE \
+	(sizeof(struct ssdfs_peb_table_fragment_header))
+
+#define SSDFS_PEB_DESC_PER_FRAGMENT(fragment_size) \
+	((fragment_size - SSDFS_PEBTBL_FRAGMENT_HDR_SIZE) / \
+	 sizeof(struct ssdfs_peb_descriptor))
+
+/*
+ * struct ssdfs_leb_descriptor - logical descriptor of erase block
+ * @physical_index: PEB table's offset till PEB's descriptor
+ * @relation_index: PEB table's offset till associated PEB's descriptor
+ */
+struct ssdfs_leb_descriptor {
+/* 0x0000 */
+	__le16 physical_index;
+	__le16 relation_index;
+
+/* 0x0004 */
+} __packed;
+
+/*
+ * struct ssdfs_leb_table_fragment_header - header of LEB table fragment
+ * @magic: signature of LEB table's fragment
+ * @flags: flags of LEB table's fragment
+ * @checksum: checksum of LEB table's fragment
+ * @start_leb: starting LEB number
+ * @lebs_count: count of LEB's descriptors in table's fragment
+ * @mapped_lebs: count of LEBs are mapped on PEBs
+ * @migrating_lebs: count of LEBs under migration
+ * @portion_id: sequential ID of mapping table fragment
+ * @fragment_id: sequential ID of LEB table fragment in the portion
+ * @bytes_count: table's fragment size in bytes
+ */
+struct ssdfs_leb_table_fragment_header {
+/* 0x0000 */
+	__le16 magic;
+#define SSDFS_LEBTBL_FRAG_ZLIB_COMPR	(1 << 0)
+#define SSDFS_LEBTBL_FRAG_LZO_COMPR	(1 << 1)
+	__le16 flags;
+	__le32 checksum;
+
+/* 0x0008 */
+	__le64 start_leb;
+
+/* 0x0010 */
+	__le16 lebs_count;
+	__le16 mapped_lebs;
+	__le16 migrating_lebs;
+	__le16 reserved1;
+
+/* 0x0018 */
+	__le16 portion_id;
+	__le16 fragment_id;
+	__le32 bytes_count;
+
+/* 0x0020 */
+} __packed;
+
+#define SSDFS_LEBTBL_FRAGMENT_HDR_SIZE \
+	(sizeof(struct ssdfs_leb_table_fragment_header))
+
+#define SSDFS_LEB_DESC_PER_FRAGMENT(fragment_size) \
+	((fragment_size - SSDFS_LEBTBL_FRAGMENT_HDR_SIZE) / \
+	 sizeof(struct ssdfs_leb_descriptor))
+
+/*
+ * The mapping table cache is the copy of content of mapping
+ * table for some type of PEBs. The goal of cache is to provide
+ * the space for storing the copy of LEB_ID/PEB_ID pairs with
+ * PEB state record. The cache is using for conversion LEB ID
+ * to PEB ID and retrieving the PEB state record in the case
+ * when the fragment of mapping table is not initialized yet.
+ * Also the cache needs for storing modified PEB state during
+ * the mapping table destruction. The fragment of mapping table
+ * cache has structure:
+ *
+ * ----------------------------
+ * |                          |
+ * |         Header           |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |   LEB_ID/PEB_ID pairs    |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |    PEB state records     |
+ * |                          |
+ * ----------------------------
+ */
+
+/*
+ * struct ssdfs_maptbl_cache_header - maptbl cache header
+ * @magic: magic signature
+ * @sequence_id: ID of fragment in the sequence
+ * @flags: maptbl cache header's flags
+ * @items_count: count of items in maptbl cache's fragment
+ * @bytes_count: size of fragment in bytes
+ * @start_leb: start LEB ID in fragment
+ * @end_leb: ending LEB ID in fragment
+ */
+struct ssdfs_maptbl_cache_header {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+	__le16 sequence_id;
+#define SSDFS_MAPTBL_CACHE_ZLIB_COMPR	(1 << 0)
+#define SSDFS_MAPTBL_CACHE_LZO_COMPR	(1 << 1)
+	__le16 flags;
+	__le16 items_count;
+	__le16 bytes_count;
+
+/* 0x0010 */
+	__le64 start_leb;
+	__le64 end_leb;
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_maptbl_cache_peb_state - PEB state descriptor
+ * @consistency: PEB state consistency type
+ * @state: PEB's state
+ * @flags: PEB's flags
+ * @shared_peb_index: index of external shared destination PEB
+ *
+ * The mapping table cache is the copy of content of mapping
+ * table for some type of PEBs. If the mapping table cache and
+ * the mapping table contain the same content for the PEB then
+ * the PEB state record is consistent. Otherwise, the PEB state
+ * record is inconsistent. For example, the inconsistency takes
+ * place if a PEB state record was modified in the mapping table
+ * cache during the destruction of the mapping table.
+ */
+struct ssdfs_maptbl_cache_peb_state {
+/* 0x0000 */
+	__le8 consistency;
+	__le8 state;
+	__le8 flags;
+	__le8 shared_peb_index;
+
+/* 0x0004 */
+} __packed;
+
+/* PEB state consistency type */
+enum {
+	SSDFS_PEB_STATE_UNKNOWN,
+	SSDFS_PEB_STATE_CONSISTENT,
+	SSDFS_PEB_STATE_INCONSISTENT,
+	SSDFS_PEB_STATE_PRE_DELETED,
+	SSDFS_PEB_STATE_MAX
+};
+
+#define SSDFS_MAPTBL_CACHE_HDR_SIZE \
+	(sizeof(struct ssdfs_maptbl_cache_header))
+#define SSDFS_LEB2PEB_PAIR_SIZE \
+	(sizeof(struct ssdfs_leb2peb_pair))
+#define SSDFS_PEB_STATE_SIZE \
+	(sizeof(struct ssdfs_maptbl_cache_peb_state))
+
+#define SSDFS_LEB2PEB_PAIR_PER_FRAGMENT(fragment_size) \
+	((fragment_size - SSDFS_MAPTBL_CACHE_HDR_SIZE - \
+				SSDFS_PEB_STATE_SIZE) / \
+	 (SSDFS_LEB2PEB_PAIR_SIZE + SSDFS_PEB_STATE_SIZE))
+
+/*
+ * struct ssdfs_btree_node_header - btree's node header
+ * @magic: magic signature + revision
+ * @check: metadata checksum
+ * @height: btree node's height
+ * @log_node_size: log2(node size)
+ * @log_index_area_size: log2(index area size)
+ * @type: btree node type
+ * @flags: btree node flags
+ * @index_area_offset: offset of index area in bytes
+ * @index_count: count of indexes in index area
+ * @index_size: size of index in bytes
+ * @min_item_size: min size of item in bytes
+ * @max_item_size: max possible size of item in bytes
+ * @items_capacity: capacity of items in the node
+ * @start_hash: start hash value
+ * @end_hash: end hash value
+ * @create_cno: create checkpoint
+ * @node_id: node identification number
+ * @item_area_offset: offset of items area in bytes
+ */
+struct ssdfs_btree_node_header {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+	struct ssdfs_metadata_check check;
+
+/* 0x0010 */
+	__le8 height;
+	__le8 log_node_size;
+	__le8 log_index_area_size;
+	__le8 type;
+
+/* 0x0014 */
+#define SSDFS_BTREE_NODE_HAS_INDEX_AREA		(1 << 0)
+#define SSDFS_BTREE_NODE_HAS_ITEMS_AREA		(1 << 1)
+#define SSDFS_BTREE_NODE_HAS_L1TBL		(1 << 2)
+#define SSDFS_BTREE_NODE_HAS_L2TBL		(1 << 3)
+#define SSDFS_BTREE_NODE_HAS_HASH_TBL		(1 << 4)
+#define SSDFS_BTREE_NODE_PRE_ALLOCATED		(1 << 5)
+#define SSDFS_BTREE_NODE_FLAGS_MASK		0x3F
+	__le16 flags;
+	__le16 index_area_offset;
+
+/* 0x0018 */
+	__le16 index_count;
+	__le8 index_size;
+	__le8 min_item_size;
+	__le16 max_item_size;
+	__le16 items_capacity;
+
+/* 0x0020 */
+	__le64 start_hash;
+	__le64 end_hash;
+
+/* 0x0030 */
+	__le64 create_cno;
+	__le32 node_id;
+	__le32 item_area_offset;
+
+/* 0x0040 */
+} __packed;
+
+/* Index of btree node in node's items sequence */
+#define SSDFS_BTREE_NODE_HEADER_INDEX	(0)
+
+/* Btree node types */
+enum {
+	SSDFS_BTREE_NODE_UNKNOWN_TYPE,
+	SSDFS_BTREE_ROOT_NODE,
+	SSDFS_BTREE_INDEX_NODE,
+	SSDFS_BTREE_HYBRID_NODE,
+	SSDFS_BTREE_LEAF_NODE,
+	SSDFS_BTREE_NODE_TYPE_MAX
+};
+
+#define SSDFS_DENTRIES_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_DENTRIES_BMAP_SIZE \
+	(((SSDFS_DENTRIES_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_dir_entry)) / BITS_PER_BYTE)
+
+/*
+ * struct ssdfs_dentries_btree_node_header - directory entries node's header
+ * @node: generic btree node's header
+ * @parent_ino: parent inode number
+ * @dentries_count: count of allocated dentries in the node
+ * @inline_names: count of dentries with inline names
+ * @flags: dentries node's flags
+ * @free_space: free space of the node in bytes
+ * @lookup_table: table for clustering search in the node
+ *
+ * The @lookup_table has goal to provide the way of clustering
+ * the dentries in the node with the goal to speed-up the search.
+ */
+struct ssdfs_dentries_btree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le64 parent_ino;
+
+/* 0x0048 */
+	__le16 dentries_count;
+	__le16 inline_names;
+	__le16 flags;
+	__le16 free_space;
+
+/* 0x0050 */
+#define SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE		(22)
+	__le64 lookup_table[SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+#define SSDFS_SHARED_DICT_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_SHARED_DICT_BMAP_SIZE \
+	(((SSDFS_SHARED_DICT_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  SSDFS_DENTRY_INLINE_NAME_MAX_LEN) / BITS_PER_BYTE)
+
+/*
+ * struct ssdfs_shdict_search_key - generalized search key
+ * @name.hash_lo: low hash32 value
+ * @name.hash_hi: tail hash of the name
+ * @range.prefix_len: prefix length in bytes
+ * @range.start_index: starting index into lookup table2
+ * @range.reserved: private part of concrete structure
+ *
+ * This key is generalized version of the first part of any
+ * item in lookup1, lookup2 and hash tables. This structure
+ * is needed for the generic way of making search in all
+ * tables.
+ */
+struct ssdfs_shdict_search_key {
+/* 0x0000 */
+	union {
+		__le32 hash_lo;
+		__le32 hash_hi;
+	} name __packed;
+
+/* 0x0004 */
+	union {
+		__le8 prefix_len;
+		__le16 start_index;
+		__le32 reserved;
+	} range __packed;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_shdict_ltbl1_item - shared dictionary lookup table1 item
+ * @hash_lo: low hash32 value
+ * @start_index: starting index into lookup table2
+ * @range_len: number of items in the range of lookup table2
+ *
+ * The header of shared dictionary node contains the lookup table1.
+ * This table is responsible for clustering the items in lookup
+ * table2. The @hash_lo is hash32 of the first part of the name.
+ * The length of the first part is the inline name length.
+ */
+struct ssdfs_shdict_ltbl1_item {
+/* 0x0000 */
+	__le32 hash_lo;
+	__le16 start_index;
+	__le16 range_len;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_shdict_ltbl2_item - shared dictionary lookup table2 item
+ * @hash_lo: low hash32 value
+ * @prefix_len: prefix length in bytes
+ * @str_count: count of strings in the range
+ * @hash_index: index of the hash in the hash table
+ *
+ * The lookup table2 is located at the end of the node. It begins from
+ * the bottom and is growing in the node's beginning direction.
+ * Every item of the lookup table2 describes a position of the starting
+ * keyword of a name. The goal of such descriptor is to describe
+ * the starting position of the deduplicated keyword that is shared by
+ * several following names. But the keyword is used only in the beginning
+ * of the sequence because the rest of the names are represented by
+ * suffixes only (for example, the sequence of names "absurd, abcissa,
+ * abacus" can be reprensented by "abacuscissasurd" deduplicated range
+ * of names).
+ */
+struct ssdfs_shdict_ltbl2_item {
+/* 0x0000 */
+	__le32 hash_lo;
+	__le8 prefix_len;
+	__le8 str_count;
+	__le16 hash_index;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_shdict_htbl_item - shared dictionary hash table item
+ * @hash_hi: tail hash of the name
+ * @str_offset: offset in bytes to string
+ * @str_len: string length
+ * @type: string type
+ *
+ * The hash table contains descriptors of all strings in
+ * string area. The @str_offset is the offset in bytes from
+ * the items (strings) area's beginning.
+ */
+struct ssdfs_shdict_htbl_item {
+/* 0x0000 */
+	__le32 hash_hi;
+	__le16 str_offset;
+	__le8 str_len;
+	__le8 type;
+
+/* 0x0008 */
+} __packed;
+
+/* Name string types */
+enum {
+	SSDFS_UNKNOWN_NAME_TYPE,
+	SSDFS_NAME_PREFIX,
+	SSDFS_NAME_SUFFIX,
+	SSDFS_FULL_NAME,
+	SSDFS_NAME_TYPE_MAX
+};
+
+/*
+ * struct ssdfs_shared_dict_area - area descriptor
+ * @offset: area offset in bytes
+ * @size: area size in bytes
+ * @free_space: free space in bytes
+ * @items_count: count of items in area
+ */
+struct ssdfs_shared_dict_area {
+/* 0x0000 */
+	__le16 offset;
+	__le16 size;
+	__le16 free_space;
+	__le16 items_count;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_shared_dictionary_node_header - shared dictionary node header
+ * @node: generic btree node's header
+ * @str_area: string area descriptor
+ * @hash_table: hash table descriptor
+ * @lookup_table2: lookup2 table descriptor
+ * @flags: private flags
+ * @lookup_table1_items: number of valid items in the lookup1 table
+ * @lookup_table1: lookup1 table
+ */
+struct ssdfs_shared_dictionary_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	struct ssdfs_shared_dict_area str_area;
+
+/* 0x0048 */
+	struct ssdfs_shared_dict_area hash_table;
+
+/* 0x0050 */
+	struct ssdfs_shared_dict_area lookup_table2;
+
+/* 0x0058 */
+	__le16 flags;
+	__le16 lookup_table1_items;
+	__le32 reserved2;
+
+/* 0x0060 */
+#define SSDFS_SHDIC_LTBL1_SIZE		(20)
+	struct ssdfs_shdict_ltbl1_item lookup_table1[SSDFS_SHDIC_LTBL1_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+#define SSDFS_EXTENT_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_EXTENT_MAX_BMAP_SIZE \
+	(((SSDFS_EXTENT_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_raw_fork)) / BITS_PER_BYTE)
+
+/*
+ * ssdfs_extents_btree_node_header - extents btree node's header
+ * @node: generic btree node's header
+ * @parent_ino: parent inode number
+ * @blks_count: count of blocks in all valid extents
+ * @forks_count: count of forks in the node
+ * @allocated_extents: count of allocated extents in all forks
+ * @valid_extents: count of valid extents
+ * @max_extent_blks: maximal number of blocks in one extent
+ * @lookup_table: table for clustering search in the node
+ *
+ * The @lookup_table has goal to provide the way of clustering
+ * the forks in the node with the goal to speed-up the search.
+ */
+struct ssdfs_extents_btree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le64 parent_ino;
+	__le64 blks_count;
+
+/* 0x0050 */
+	__le32 forks_count;
+	__le32 allocated_extents;
+	__le32 valid_extents;
+	__le32 max_extent_blks;
+
+/* 0x0060 */
+#define SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE		(20)
+	__le64 lookup_table[SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+#define SSDFS_XATTRS_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_XATTRS_BMAP_SIZE \
+	(((SSDFS_XATTRS_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_xattr_entry)) / BITS_PER_BYTE)
+
+/*
+ * struct ssdfs_xattrs_btree_node_header - xattrs node's header
+ * @node: generic btree node's header
+ * @parent_ino: parent inode number
+ * @xattrs_count: count of allocated xattrs in the node
+ * @flags: xattrs node's flags
+ * @free_space: free space of the node in bytes
+ * @lookup_table: table for clustering search in the node
+ *
+ * The @lookup_table has goal to provide the way of clustering
+ * the xattrs in the node with the goal to speed-up the search.
+ */
+struct ssdfs_xattrs_btree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le64 parent_ino;
+
+/* 0x0048 */
+	__le16 xattrs_count;
+	__le16 reserved;
+	__le16 flags;
+	__le16 free_space;
+
+/* 0x0050 */
+#define SSDFS_XATTRS_BTREE_LOOKUP_TABLE_SIZE		(22)
+	__le64 lookup_table[SSDFS_XATTRS_BTREE_LOOKUP_TABLE_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+/*
+ * struct ssdfs_index_area - index area info
+ * @start_hash: start hash value
+ * @end_hash: end hash value
+ */
+struct ssdfs_index_area {
+/* 0x0000 */
+	__le64 start_hash;
+	__le64 end_hash;
+
+/* 0x0010 */
+} __packed;
+
+#define SSDFS_INODE_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_INODE_BMAP_SIZE \
+	(((SSDFS_INODE_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_inode)) / BITS_PER_BYTE)
+
+/*
+ * struct ssdfs_inodes_btree_node_header -inodes btree node's header
+ * @node: generic btree node's header
+ * @inodes_count: count of inodes in the node
+ * @valid_inodes: count of valid inodes in the node
+ * @index_area: index area info (hybrid node)
+ * @bmap: bitmap of valid/invalid inodes in the node
+ */
+struct ssdfs_inodes_btree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le16 inodes_count;
+	__le16 valid_inodes;
+	__le8 reserved1[0xC];
+
+/* 0x0050 */
+	struct ssdfs_index_area index_area;
+
+/* 0x0060 */
+	__le8 reserved2[0x60];
+
+/* 0x00C0 */
+	__le8 bmap[SSDFS_INODE_BMAP_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+/*
+ * struct ssdfs_snapshot_rule_info - snapshot rule info
+ * @mode: snapshot mode (READ-ONLY|READ-WRITE)
+ * @type: snapshot type (PERIODIC|ONE-TIME)
+ * @expiration: snapshot expiration time (WEEK|MONTH|YEAR|NEVER)
+ * @frequency: taking snapshot frequency (SYNCFS|HOUR|DAY|WEEK)
+ * @snapshots_threshold max number of simultaneously available snapshots
+ * @snapshots_number: current number of created snapshots
+ * @ino: root object inode ID
+ * @uuid: snapshot UUID
+ * @name: snapshot rule name
+ * @name_hash: name hash
+ * @last_snapshot_cno: latest snapshot checkpoint
+ */
+struct ssdfs_snapshot_rule_info {
+/* 0x0000 */
+	__le8 mode;
+	__le8 type;
+	__le8 expiration;
+	__le8 frequency;
+	__le16 snapshots_threshold;
+	__le16 snapshots_number;
+
+/* 0x0008 */
+	__le64 ino;
+
+/* 0x0010 */
+	__le8 uuid[SSDFS_UUID_SIZE];
+
+/* 0x0020 */
+	char name[SSDFS_MAX_SNAP_RULE_NAME_LEN];
+
+/* 0x0030 */
+	__le64 name_hash;
+	__le64 last_snapshot_cno;
+
+/* 0x0040 */
+} __packed;
+
+/* Snapshot mode */
+enum {
+	SSDFS_UNKNOWN_SNAPSHOT_MODE,
+	SSDFS_READ_ONLY_SNAPSHOT,
+	SSDFS_READ_WRITE_SNAPSHOT,
+	SSDFS_SNAPSHOT_MODE_MAX
+};
+
+#define SSDFS_READ_ONLY_MODE_STR	"READ_ONLY"
+#define SSDFS_READ_WRITE_MODE_STR	"READ_WRITE"
+
+/* Snapshot type */
+enum {
+	SSDFS_UNKNOWN_SNAPSHOT_TYPE,
+	SSDFS_ONE_TIME_SNAPSHOT,
+	SSDFS_PERIODIC_SNAPSHOT,
+	SSDFS_SNAPSHOT_TYPE_MAX
+};
+
+#define SSDFS_ONE_TIME_TYPE_STR		"ONE-TIME"
+#define SSDFS_PERIODIC_TYPE_STR		"PERIODIC"
+
+/* Snapshot expiration */
+enum {
+	SSDFS_UNKNOWN_EXPIRATION_POINT,
+	SSDFS_EXPIRATION_IN_WEEK,
+	SSDFS_EXPIRATION_IN_MONTH,
+	SSDFS_EXPIRATION_IN_YEAR,
+	SSDFS_NEVER_EXPIRED,
+	SSDFS_EXPIRATION_POINT_MAX
+};
+
+#define SSDFS_WEEK_EXPIRATION_POINT_STR		"WEEK"
+#define SSDFS_MONTH_EXPIRATION_POINT_STR	"MONTH"
+#define SSDFS_YEAR_EXPIRATION_POINT_STR		"YEAR"
+#define SSDFS_NEVER_EXPIRED_STR			"NEVER"
+
+/* Snapshot creation frequency */
+enum {
+	SSDFS_UNKNOWN_FREQUENCY,
+	SSDFS_SYNCFS_FREQUENCY,
+	SSDFS_HOUR_FREQUENCY,
+	SSDFS_DAY_FREQUENCY,
+	SSDFS_WEEK_FREQUENCY,
+	SSDFS_MONTH_FREQUENCY,
+	SSDFS_CREATION_FREQUENCY_MAX
+};
+
+#define SSDFS_SYNCFS_FREQUENCY_STR		"SYNCFS"
+#define SSDFS_HOUR_FREQUENCY_STR		"HOUR"
+#define SSDFS_DAY_FREQUENCY_STR			"DAY"
+#define SSDFS_WEEK_FREQUENCY_STR		"WEEK"
+#define SSDFS_MONTH_FREQUENCY_STR		"MONTH"
+
+#define SSDFS_INFINITE_SNAPSHOTS_NUMBER		U16_MAX
+#define SSDFS_UNDEFINED_SNAPSHOTS_NUMBER	(0)
+
+/*
+ * struct ssdfs_snapshot_rules_header - snapshot rules table's header
+ * @magic: magic signature
+ * @item_size: snapshot rule's size in bytes
+ * @flags: various flags
+ * @items_count: number of snapshot rules in table
+ * @items_capacity: capacity of the snaphot rules table
+ * @area_size: size of table in bytes
+ */
+struct ssdfs_snapshot_rules_header {
+/* 0x0000 */
+	__le32 magic;
+	__le16 item_size;
+	__le16 flags;
+
+/* 0x0008 */
+	__le16 items_count;
+	__le16 items_capacity;
+	__le32 area_size;
+
+/* 0x0010 */
+	__le8 padding[0x10];
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_snapshot - snapshot info
+ * @magic: magic signature of snapshot
+ * @mode: snapshot mode (READ-ONLY|READ-WRITE)
+ * @expiration: snapshot expiration time (WEEK|MONTH|YEAR|NEVER)
+ * @flags: snapshot's flags
+ * @name: snapshot name
+ * @uuid: snapshot UUID
+ * @create_time: snapshot's timestamp
+ * @create_cno: snapshot's checkpoint
+ * @ino: root object inode ID
+ * @name_hash: name hash
+ */
+struct ssdfs_snapshot {
+/* 0x0000 */
+	__le16 magic;
+	__le8 mode : 4;
+	__le8 expiration : 4;
+	__le8 flags;
+	char name[SSDFS_MAX_SNAPSHOT_NAME_LEN];
+
+/* 0x0010 */
+	__le8 uuid[SSDFS_UUID_SIZE];
+
+/* 0x0020 */
+	__le64 create_time;
+	__le64 create_cno;
+
+/* 0x0030 */
+	__le64 ino;
+	__le64 name_hash;
+
+/* 0x0040 */
+} __packed;
+
+/* snapshot flags */
+#define SSDFS_SNAPSHOT_HAS_EXTERNAL_STRING	(1 << 0)
+#define SSDFS_SNAPSHOT_FLAGS_MASK		0x1
+
+/*
+ * struct ssdfs_peb2time_pair - PEB to timestamp pair
+ * @peb_id: PEB ID
+ * @last_log_time: last log creation time
+ */
+struct ssdfs_peb2time_pair {
+/* 0x0000 */
+	__le64 peb_id;
+	__le64 last_log_time;
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_peb2time_set - PEB to timestamp set
+ * @magic: magic signature of set
+ * @pairs_count: number of valid pairs in the set
+ * @create_time: create time of the first PEB in pair set
+ * @array: array of PEB to timestamp pairs
+ */
+struct ssdfs_peb2time_set {
+/* 0x0000 */
+	__le16 magic;
+	__le8 pairs_count;
+	__le8 padding[0x5];
+
+/* 0x0008 */
+	__le64 create_time;
+
+/* 0x0010 */
+#define SSDFS_PEB2TIME_ARRAY_CAPACITY		(3)
+	struct ssdfs_peb2time_pair array[SSDFS_PEB2TIME_ARRAY_CAPACITY];
+
+/* 0x0040 */
+} __packed;
+
+/*
+ * union ssdfs_snapshot_item - snapshot item
+ * @magic: magic signature
+ * @snapshot: snapshot info
+ * @peb2time: PEB to timestamp set
+ */
+union ssdfs_snapshot_item {
+/* 0x0000 */
+	__le16 magic;
+	struct ssdfs_snapshot snapshot;
+	struct ssdfs_peb2time_set peb2time;
+
+/* 0x0040 */
+} __packed;
+
+#define SSDFS_SNAPSHOTS_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_SNAPSHOTS_BMAP_SIZE \
+	(((SSDFS_SNAPSHOTS_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_snapshot_info)) / BITS_PER_BYTE)
+
+/*
+ * struct ssdfs_snapshots_btree_node_header - snapshots node's header
+ * @node: generic btree node's header
+ * @snapshots_count: snapshots count in the node
+ * @lookup_table: table for clustering search in the node
+ *
+ * The @lookup_table has goal to provide the way of clustering
+ * the snapshots in the node with the goal to speed-up the search.
+ */
+struct ssdfs_snapshots_btree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le32 snapshots_count;
+	__le8 padding[0x0C];
+
+/* 0x0050 */
+#define SSDFS_SNAPSHOTS_BTREE_LOOKUP_TABLE_SIZE		(22)
+	__le64 lookup_table[SSDFS_SNAPSHOTS_BTREE_LOOKUP_TABLE_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+/*
+ * struct ssdfs_shared_extent - shared extent
+ * @fingerprint: fingerprint of shared extent
+ * @extent: position of the extent on volume
+ * @fingerprint_len: length of fingerprint
+ * @fingerprint_type: type of fingerprint
+ * @flags: various flags
+ * @ref_count: reference counter of shared extent
+ */
+struct ssdfs_shared_extent {
+/* 0x0000 */
+#define SSDFS_FINGERPRINT_LENGTH_MAX	(32)
+	__le8 fingerprint[SSDFS_FINGERPRINT_LENGTH_MAX];
+
+/* 0x0020 */
+	struct ssdfs_raw_extent extent;
+
+/* 0x0030 */
+	__le8 fingerprint_len;
+	__le8 fingerprint_type;
+	__le16 flags;
+	__le8 padding[0x4];
+
+/* 0x0038 */
+	__le64 ref_count;
+
+/* 0x0040 */
+} __packed;
+
+#define SSDFS_SHEXTREE_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_SHEXTREE_BMAP_SIZE \
+	(((SSDFS_SHEXTREE_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_shared_extent)) / BITS_PER_BYTE)
+
+/*
+ * struct ssdfs_shextree_node_header - shared extents btree node's header
+ * @node: generic btree node's header
+ * @shared_extents: number of shared extents in the node
+ * @lookup_table: table for clustering search in the node
+ *
+ * The @lookup_table has goal to provide the way of clustering
+ * the shared extents in the node with the goal to speed-up the search.
+ */
+struct ssdfs_shextree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le32 shared_extents;
+	__le8 padding[0x0C];
+
+/* 0x0050 */
+#define SSDFS_SHEXTREE_LOOKUP_TABLE_SIZE		(22)
+	__le64 lookup_table[SSDFS_SHEXTREE_LOOKUP_TABLE_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+#define SSDFS_INVEXTREE_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_INVEXTREE_BMAP_SIZE \
+	(((SSDFS_INVEXTREE_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_raw_extent)) / BITS_PER_BYTE)
+
+/*
+ * struct ssdfs_invextree_node_header - invalidated extents btree node's header
+ * @node: generic btree node's header
+ * @extents_count: number of invalidated extents in the node
+ * @lookup_table: table for clustering search in the node
+ *
+ * The @lookup_table has goal to provide the way of clustering
+ * the invalidated extents in the node with the goal to speed-up the search.
+ */
+struct ssdfs_invextree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le32 extents_count;
+	__le8 padding[0x0C];
+
+/* 0x0050 */
+#define SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE		(22)
+	__le64 lookup_table[SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+#endif /* _LINUX_SSDFS_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH 02/76] ssdfs: key file system declarations
  2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 01/76] ssdfs: introduce SSDFS on-disk layout Viacheslav Dubeyko
@ 2023-02-25  1:08 ` Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 03/76] ssdfs: implement raw device operations Viacheslav Dubeyko
                   ` (74 subsequent siblings)
  76 siblings, 0 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

This patch contains declarations of key constants,
macros, inline functions implementations and
function declarations.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
CC: Viacheslav Dubeyko <viacheslav.dubeyko@bytedance.com>
CC: Luka Perkov <luka.perkov@sartura.hr>
CC: Bruno Banelli <bruno.banelli@sartura.hr>
---
 fs/ssdfs/ssdfs.h              |  411 ++++++++++
 fs/ssdfs/ssdfs_constants.h    |   81 ++
 fs/ssdfs/ssdfs_fs_info.h      |  412 ++++++++++
 fs/ssdfs/ssdfs_inline.h       | 1346 +++++++++++++++++++++++++++++++++
 fs/ssdfs/ssdfs_inode_info.h   |  143 ++++
 fs/ssdfs/ssdfs_thread_info.h  |   42 +
 fs/ssdfs/version.h            |    7 +
 include/trace/events/ssdfs.h  |  255 +++++++
 include/uapi/linux/ssdfs_fs.h |  117 +++
 9 files changed, 2814 insertions(+)
 create mode 100644 fs/ssdfs/ssdfs.h
 create mode 100644 fs/ssdfs/ssdfs_constants.h
 create mode 100644 fs/ssdfs/ssdfs_fs_info.h
 create mode 100644 fs/ssdfs/ssdfs_inline.h
 create mode 100644 fs/ssdfs/ssdfs_inode_info.h
 create mode 100644 fs/ssdfs/ssdfs_thread_info.h
 create mode 100644 fs/ssdfs/version.h
 create mode 100644 include/trace/events/ssdfs.h
 create mode 100644 include/uapi/linux/ssdfs_fs.h

diff --git a/fs/ssdfs/ssdfs.h b/fs/ssdfs/ssdfs.h
new file mode 100644
index 000000000000..c0d5d7ace2eb
--- /dev/null
+++ b/fs/ssdfs/ssdfs.h
@@ -0,0 +1,411 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ssdfs.h - in-core declarations.
+ *
+ * Copyright (c) 2019-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_H
+#define _SSDFS_H
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kobject.h>
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/crc32.h>
+#include <linux/pagemap.h>
+#include <linux/ssdfs_fs.h>
+
+#include "ssdfs_constants.h"
+#include "ssdfs_thread_info.h"
+#include "ssdfs_inode_info.h"
+#include "snapshot.h"
+#include "snapshot_requests_queue.h"
+#include "snapshot_rules.h"
+#include "ssdfs_fs_info.h"
+#include "ssdfs_inline.h"
+
+/*
+ * struct ssdfs_value_pair - value/position pair
+ * @value: some value
+ * @pos: position of value
+ */
+struct ssdfs_value_pair {
+	int value;
+	int pos;
+};
+
+/*
+ * struct ssdfs_min_max_pair - minimum and maximum values pair
+ * @min: minimum value/position pair
+ * @max: maximum value/position pair
+ */
+struct ssdfs_min_max_pair {
+	struct ssdfs_value_pair min;
+	struct ssdfs_value_pair max;
+};
+
+/*
+ * struct ssdfs_block_bmap_range - block bitmap items range
+ * @start: begin item
+ * @len: count of items in the range
+ */
+struct ssdfs_block_bmap_range {
+	u32 start;
+	u32 len;
+};
+
+struct ssdfs_peb_info;
+struct ssdfs_peb_container;
+struct ssdfs_segment_info;
+struct ssdfs_peb_blk_bmap;
+
+/* btree_node.c */
+void ssdfs_zero_btree_node_obj_cache_ptr(void);
+int ssdfs_init_btree_node_obj_cache(void);
+void ssdfs_shrink_btree_node_obj_cache(void);
+void ssdfs_destroy_btree_node_obj_cache(void);
+
+/* btree_search.c */
+void ssdfs_zero_btree_search_obj_cache_ptr(void);
+int ssdfs_init_btree_search_obj_cache(void);
+void ssdfs_shrink_btree_search_obj_cache(void);
+void ssdfs_destroy_btree_search_obj_cache(void);
+
+/* compression.c */
+int ssdfs_compressors_init(void);
+void ssdfs_free_workspaces(void);
+void ssdfs_compressors_exit(void);
+
+/* dev_bdev.c */
+struct bio *ssdfs_bdev_bio_alloc(struct block_device *bdev,
+				 unsigned int nr_iovecs,
+				 unsigned int op,
+				 gfp_t gfp_mask);
+void ssdfs_bdev_bio_put(struct bio *bio);
+int ssdfs_bdev_bio_add_page(struct bio *bio, struct page *page,
+			    unsigned int len, unsigned int offset);
+int ssdfs_bdev_readpage(struct super_block *sb, struct page *page,
+			loff_t offset);
+int ssdfs_bdev_readpages(struct super_block *sb, struct pagevec *pvec,
+			 loff_t offset);
+int ssdfs_bdev_read(struct super_block *sb, loff_t offset,
+		    size_t len, void *buf);
+int ssdfs_bdev_can_write_page(struct super_block *sb, loff_t offset,
+			      bool need_check);
+int ssdfs_bdev_writepage(struct super_block *sb, loff_t to_off,
+			 struct page *page, u32 from_off, size_t len);
+int ssdfs_bdev_writepages(struct super_block *sb, loff_t to_off,
+			  struct pagevec *pvec,
+			  u32 from_off, size_t len);
+
+/* dev_zns.c */
+u64 ssdfs_zns_zone_size(struct super_block *sb, loff_t offset);
+u64 ssdfs_zns_zone_capacity(struct super_block *sb, loff_t offset);
+
+/* dir.c */
+int ssdfs_inode_by_name(struct inode *dir,
+			const struct qstr *child,
+			ino_t *ino);
+int ssdfs_create(struct user_namespace *mnt_userns,
+		 struct inode *dir, struct dentry *dentry,
+		 umode_t mode, bool excl);
+
+/* file.c */
+int ssdfs_allocate_inline_file_buffer(struct inode *inode);
+void ssdfs_destroy_inline_file_buffer(struct inode *inode);
+int ssdfs_fsync(struct file *file, loff_t start, loff_t end, int datasync);
+
+/* fs_error.c */
+extern __printf(5, 6)
+void ssdfs_fs_error(struct super_block *sb, const char *file,
+		    const char *function, unsigned int line,
+		    const char *fmt, ...);
+int ssdfs_set_page_dirty(struct page *page);
+int __ssdfs_clear_dirty_page(struct page *page);
+int ssdfs_clear_dirty_page(struct page *page);
+void ssdfs_clear_dirty_pages(struct address_space *mapping);
+
+/* inode.c */
+bool is_raw_inode_checksum_correct(struct ssdfs_fs_info *fsi,
+				   void *buf, size_t size);
+struct inode *ssdfs_iget(struct super_block *sb, ino_t ino);
+struct inode *ssdfs_new_inode(struct inode *dir, umode_t mode,
+			      const struct qstr *qstr);
+int ssdfs_getattr(struct user_namespace *mnt_userns,
+		  const struct path *path, struct kstat *stat,
+		  u32 request_mask, unsigned int query_flags);
+int ssdfs_setattr(struct user_namespace *mnt_userns,
+		  struct dentry *dentry, struct iattr *attr);
+void ssdfs_evict_inode(struct inode *inode);
+int ssdfs_write_inode(struct inode *inode, struct writeback_control *wbc);
+int ssdfs_statfs(struct dentry *dentry, struct kstatfs *buf);
+void ssdfs_set_inode_flags(struct inode *inode);
+
+/* inodes_tree.c */
+void ssdfs_zero_free_ino_desc_cache_ptr(void);
+int ssdfs_init_free_ino_desc_cache(void);
+void ssdfs_shrink_free_ino_desc_cache(void);
+void ssdfs_destroy_free_ino_desc_cache(void);
+
+/* ioctl.c */
+long ssdfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+
+/* log_footer.c */
+bool __is_ssdfs_log_footer_magic_valid(struct ssdfs_signature *magic);
+bool is_ssdfs_log_footer_magic_valid(struct ssdfs_log_footer *footer);
+bool is_ssdfs_log_footer_csum_valid(void *buf, size_t buf_size);
+bool is_ssdfs_volume_state_info_consistent(struct ssdfs_fs_info *fsi,
+					   void *buf,
+					   struct ssdfs_log_footer *footer,
+					   u64 dev_size);
+int ssdfs_read_unchecked_log_footer(struct ssdfs_fs_info *fsi,
+				    u64 peb_id, u32 bytes_off,
+				    void *buf, bool silent,
+				    u32 *log_pages);
+int ssdfs_check_log_footer(struct ssdfs_fs_info *fsi,
+			   void *buf,
+			   struct ssdfs_log_footer *footer,
+			   bool silent);
+int ssdfs_read_checked_log_footer(struct ssdfs_fs_info *fsi, void *log_hdr,
+				  u64 peb_id, u32 bytes_off, void *buf,
+				  bool silent);
+int ssdfs_prepare_current_segment_ids(struct ssdfs_fs_info *fsi,
+					__le64 *array,
+					size_t size);
+int ssdfs_prepare_volume_state_info_for_commit(struct ssdfs_fs_info *fsi,
+						u16 fs_state,
+						__le64 *cur_segs,
+						size_t size,
+						u64 last_log_time,
+						u64 last_log_cno,
+						struct ssdfs_volume_state *vs);
+int ssdfs_prepare_log_footer_for_commit(struct ssdfs_fs_info *fsi,
+					u32 log_pages,
+					u32 log_flags,
+					u64 last_log_time,
+					u64 last_log_cno,
+					struct ssdfs_log_footer *footer);
+
+/* offset_translation_table.c */
+void ssdfs_zero_blk2off_frag_obj_cache_ptr(void);
+int ssdfs_init_blk2off_frag_obj_cache(void);
+void ssdfs_shrink_blk2off_frag_obj_cache(void);
+void ssdfs_destroy_blk2off_frag_obj_cache(void);
+
+/* options.c */
+int ssdfs_parse_options(struct ssdfs_fs_info *fs_info, char *data);
+void ssdfs_initialize_fs_errors_option(struct ssdfs_fs_info *fsi);
+int ssdfs_show_options(struct seq_file *seq, struct dentry *root);
+
+/* peb_migration_scheme.c */
+int ssdfs_peb_start_migration(struct ssdfs_peb_container *pebc);
+bool is_peb_under_migration(struct ssdfs_peb_container *pebc);
+bool is_pebs_relation_alive(struct ssdfs_peb_container *pebc);
+bool has_peb_migration_done(struct ssdfs_peb_container *pebc);
+bool should_migration_be_finished(struct ssdfs_peb_container *pebc);
+int ssdfs_peb_finish_migration(struct ssdfs_peb_container *pebc);
+bool has_ssdfs_source_peb_valid_blocks(struct ssdfs_peb_container *pebc);
+int ssdfs_peb_prepare_range_migration(struct ssdfs_peb_container *pebc,
+				      u32 range_len, int blk_type);
+int ssdfs_peb_migrate_valid_blocks_range(struct ssdfs_segment_info *si,
+					 struct ssdfs_peb_container *pebc,
+					 struct ssdfs_peb_blk_bmap *peb_blkbmap,
+					 struct ssdfs_block_bmap_range *range);
+
+/* readwrite.c */
+int ssdfs_read_page_from_volume(struct ssdfs_fs_info *fsi,
+				u64 peb_id, u32 bytes_off,
+				struct page *page);
+int ssdfs_read_pagevec_from_volume(struct ssdfs_fs_info *fsi,
+				   u64 peb_id, u32 bytes_off,
+				   struct pagevec *pvec);
+int ssdfs_aligned_read_buffer(struct ssdfs_fs_info *fsi,
+			      u64 peb_id, u32 bytes_off,
+			      void *buf, size_t size,
+			      size_t *read_bytes);
+int ssdfs_unaligned_read_buffer(struct ssdfs_fs_info *fsi,
+				u64 peb_id, u32 bytes_off,
+				void *buf, size_t size);
+int ssdfs_can_write_sb_log(struct super_block *sb,
+			   struct ssdfs_peb_extent *sb_log);
+int ssdfs_unaligned_read_pagevec(struct pagevec *pvec,
+				 u32 offset, u32 size,
+				 void *buf);
+int ssdfs_unaligned_write_pagevec(struct pagevec *pvec,
+				  u32 offset, u32 size,
+				  void *buf);
+
+/* recovery.c */
+int ssdfs_init_sb_info(struct ssdfs_fs_info *fsi,
+			struct ssdfs_sb_info *sbi);
+void ssdfs_destruct_sb_info(struct ssdfs_sb_info *sbi);
+void ssdfs_backup_sb_info(struct ssdfs_fs_info *fsi);
+void ssdfs_restore_sb_info(struct ssdfs_fs_info *fsi);
+int ssdfs_gather_superblock_info(struct ssdfs_fs_info *fsi, int silent);
+
+/* segment.c */
+void ssdfs_zero_seg_obj_cache_ptr(void);
+int ssdfs_init_seg_obj_cache(void);
+void ssdfs_shrink_seg_obj_cache(void);
+void ssdfs_destroy_seg_obj_cache(void);
+int ssdfs_segment_get_used_data_pages(struct ssdfs_segment_info *si);
+
+/* sysfs.c */
+int ssdfs_sysfs_init(void);
+void ssdfs_sysfs_exit(void);
+int ssdfs_sysfs_create_device_group(struct super_block *sb);
+void ssdfs_sysfs_delete_device_group(struct ssdfs_fs_info *fsi);
+int ssdfs_sysfs_create_seg_group(struct ssdfs_segment_info *si);
+void ssdfs_sysfs_delete_seg_group(struct ssdfs_segment_info *si);
+int ssdfs_sysfs_create_peb_group(struct ssdfs_peb_container *pebc);
+void ssdfs_sysfs_delete_peb_group(struct ssdfs_peb_container *pebc);
+
+/* volume_header.c */
+bool __is_ssdfs_segment_header_magic_valid(struct ssdfs_signature *magic);
+bool is_ssdfs_segment_header_magic_valid(struct ssdfs_segment_header *hdr);
+bool is_ssdfs_partial_log_header_magic_valid(struct ssdfs_signature *magic);
+bool is_ssdfs_volume_header_csum_valid(void *vh_buf, size_t buf_size);
+bool is_ssdfs_partial_log_header_csum_valid(void *plh_buf, size_t buf_size);
+bool is_ssdfs_volume_header_consistent(struct ssdfs_fs_info *fsi,
+					struct ssdfs_volume_header *vh,
+					u64 dev_size);
+int ssdfs_check_segment_header(struct ssdfs_fs_info *fsi,
+				struct ssdfs_segment_header *hdr,
+				bool silent);
+int ssdfs_read_checked_segment_header(struct ssdfs_fs_info *fsi,
+					u64 peb_id, u32 pages_off,
+					void *buf, bool silent);
+int ssdfs_check_partial_log_header(struct ssdfs_fs_info *fsi,
+				   struct ssdfs_partial_log_header *hdr,
+				   bool silent);
+void ssdfs_create_volume_header(struct ssdfs_fs_info *fsi,
+				struct ssdfs_volume_header *vh);
+int ssdfs_prepare_volume_header_for_commit(struct ssdfs_fs_info *fsi,
+					   struct ssdfs_volume_header *vh);
+int ssdfs_prepare_segment_header_for_commit(struct ssdfs_fs_info *fsi,
+					    u32 log_pages,
+					    u16 seg_type,
+					    u32 seg_flags,
+					    u64 last_log_time,
+					    u64 last_log_cno,
+					    struct ssdfs_segment_header *hdr);
+int ssdfs_prepare_partial_log_header_for_commit(struct ssdfs_fs_info *fsi,
+					int sequence_id,
+					u32 log_pages,
+					u16 seg_type,
+					u32 flags,
+					u64 last_log_time,
+					u64 last_log_cno,
+					struct ssdfs_partial_log_header *hdr);
+
+/* memory leaks checker */
+void ssdfs_acl_memory_leaks_init(void);
+void ssdfs_acl_check_memory_leaks(void);
+void ssdfs_block_bmap_memory_leaks_init(void);
+void ssdfs_block_bmap_check_memory_leaks(void);
+void ssdfs_blk2off_memory_leaks_init(void);
+void ssdfs_blk2off_check_memory_leaks(void);
+void ssdfs_btree_memory_leaks_init(void);
+void ssdfs_btree_check_memory_leaks(void);
+void ssdfs_btree_hierarchy_memory_leaks_init(void);
+void ssdfs_btree_hierarchy_check_memory_leaks(void);
+void ssdfs_btree_node_memory_leaks_init(void);
+void ssdfs_btree_node_check_memory_leaks(void);
+void ssdfs_btree_search_memory_leaks_init(void);
+void ssdfs_btree_search_check_memory_leaks(void);
+void ssdfs_lzo_memory_leaks_init(void);
+void ssdfs_lzo_check_memory_leaks(void);
+void ssdfs_zlib_memory_leaks_init(void);
+void ssdfs_zlib_check_memory_leaks(void);
+void ssdfs_compr_memory_leaks_init(void);
+void ssdfs_compr_check_memory_leaks(void);
+void ssdfs_cur_seg_memory_leaks_init(void);
+void ssdfs_cur_seg_check_memory_leaks(void);
+void ssdfs_dentries_memory_leaks_init(void);
+void ssdfs_dentries_check_memory_leaks(void);
+void ssdfs_dev_bdev_memory_leaks_init(void);
+void ssdfs_dev_bdev_check_memory_leaks(void);
+void ssdfs_dev_zns_memory_leaks_init(void);
+void ssdfs_dev_zns_check_memory_leaks(void);
+void ssdfs_dev_mtd_memory_leaks_init(void);
+void ssdfs_dev_mtd_check_memory_leaks(void);
+void ssdfs_dir_memory_leaks_init(void);
+void ssdfs_dir_check_memory_leaks(void);
+void ssdfs_diff_memory_leaks_init(void);
+void ssdfs_diff_check_memory_leaks(void);
+void ssdfs_ext_queue_memory_leaks_init(void);
+void ssdfs_ext_queue_check_memory_leaks(void);
+void ssdfs_ext_tree_memory_leaks_init(void);
+void ssdfs_ext_tree_check_memory_leaks(void);
+void ssdfs_file_memory_leaks_init(void);
+void ssdfs_file_check_memory_leaks(void);
+void ssdfs_fs_error_memory_leaks_init(void);
+void ssdfs_fs_error_check_memory_leaks(void);
+void ssdfs_inode_memory_leaks_init(void);
+void ssdfs_inode_check_memory_leaks(void);
+void ssdfs_ino_tree_memory_leaks_init(void);
+void ssdfs_ino_tree_check_memory_leaks(void);
+void ssdfs_invext_tree_memory_leaks_init(void);
+void ssdfs_invext_tree_check_memory_leaks(void);
+void ssdfs_parray_memory_leaks_init(void);
+void ssdfs_parray_check_memory_leaks(void);
+void ssdfs_page_vector_memory_leaks_init(void);
+void ssdfs_page_vector_check_memory_leaks(void);
+void ssdfs_flush_memory_leaks_init(void);
+void ssdfs_flush_check_memory_leaks(void);
+void ssdfs_gc_memory_leaks_init(void);
+void ssdfs_gc_check_memory_leaks(void);
+void ssdfs_map_queue_memory_leaks_init(void);
+void ssdfs_map_queue_check_memory_leaks(void);
+void ssdfs_map_tbl_memory_leaks_init(void);
+void ssdfs_map_tbl_check_memory_leaks(void);
+void ssdfs_map_cache_memory_leaks_init(void);
+void ssdfs_map_cache_check_memory_leaks(void);
+void ssdfs_map_thread_memory_leaks_init(void);
+void ssdfs_map_thread_check_memory_leaks(void);
+void ssdfs_migration_memory_leaks_init(void);
+void ssdfs_migration_check_memory_leaks(void);
+void ssdfs_peb_memory_leaks_init(void);
+void ssdfs_peb_check_memory_leaks(void);
+void ssdfs_read_memory_leaks_init(void);
+void ssdfs_read_check_memory_leaks(void);
+void ssdfs_recovery_memory_leaks_init(void);
+void ssdfs_recovery_check_memory_leaks(void);
+void ssdfs_req_queue_memory_leaks_init(void);
+void ssdfs_req_queue_check_memory_leaks(void);
+void ssdfs_seg_obj_memory_leaks_init(void);
+void ssdfs_seg_obj_check_memory_leaks(void);
+void ssdfs_seg_bmap_memory_leaks_init(void);
+void ssdfs_seg_bmap_check_memory_leaks(void);
+void ssdfs_seg_blk_memory_leaks_init(void);
+void ssdfs_seg_blk_check_memory_leaks(void);
+void ssdfs_seg_tree_memory_leaks_init(void);
+void ssdfs_seg_tree_check_memory_leaks(void);
+void ssdfs_seq_arr_memory_leaks_init(void);
+void ssdfs_seq_arr_check_memory_leaks(void);
+void ssdfs_dict_memory_leaks_init(void);
+void ssdfs_dict_check_memory_leaks(void);
+void ssdfs_shextree_memory_leaks_init(void);
+void ssdfs_shextree_check_memory_leaks(void);
+void ssdfs_snap_reqs_queue_memory_leaks_init(void);
+void ssdfs_snap_reqs_queue_check_memory_leaks(void);
+void ssdfs_snap_rules_list_memory_leaks_init(void);
+void ssdfs_snap_rules_list_check_memory_leaks(void);
+void ssdfs_snap_tree_memory_leaks_init(void);
+void ssdfs_snap_tree_check_memory_leaks(void);
+void ssdfs_xattr_memory_leaks_init(void);
+void ssdfs_xattr_check_memory_leaks(void);
+
+#endif /* _SSDFS_H */
diff --git a/fs/ssdfs/ssdfs_constants.h b/fs/ssdfs/ssdfs_constants.h
new file mode 100644
index 000000000000..d5ba89d8b272
--- /dev/null
+++ b/fs/ssdfs/ssdfs_constants.h
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ssdfs_constants.h - SSDFS constant declarations.
+ *
+ * Copyright (c) 2019-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_CONSTANTS_H
+#define _SSDFS_CONSTANTS_H
+
+/*
+ * Thread types
+ */
+enum {
+	SSDFS_PEB_READ_THREAD,
+	SSDFS_PEB_FLUSH_THREAD,
+	SSDFS_PEB_GC_THREAD,
+	SSDFS_PEB_THREAD_TYPE_MAX,
+};
+
+enum {
+	SSDFS_SEG_USING_GC_THREAD,
+	SSDFS_SEG_USED_GC_THREAD,
+	SSDFS_SEG_PRE_DIRTY_GC_THREAD,
+	SSDFS_SEG_DIRTY_GC_THREAD,
+	SSDFS_GC_THREAD_TYPE_MAX,
+};
+
+enum {
+	SSDFS_256B	= 256,
+	SSDFS_512B	= 512,
+	SSDFS_1KB	= 1024,
+	SSDFS_2KB	= 2048,
+	SSDFS_4KB	= 4096,
+	SSDFS_8KB	= 8192,
+	SSDFS_16KB	= 16384,
+	SSDFS_32KB	= 32768,
+	SSDFS_64KB	= 65536,
+	SSDFS_128KB	= 131072,
+	SSDFS_256KB	= 262144,
+	SSDFS_512KB	= 524288,
+	SSDFS_1MB	= 1048576,
+	SSDFS_2MB	= 2097152,
+	SSDFS_8MB	= 8388608,
+	SSDFS_16MB	= 16777216,
+	SSDFS_32MB	= 33554432,
+	SSDFS_64MB	= 67108864,
+	SSDFS_128MB	= 134217728,
+	SSDFS_256MB	= 268435456,
+	SSDFS_512MB	= 536870912,
+	SSDFS_1GB	= 1073741824,
+	SSDFS_2GB	= 2147483648,
+	SSDFS_8GB	= 8589934592,
+	SSDFS_16GB	= 17179869184,
+	SSDFS_32GB	= 34359738368,
+	SSDFS_64GB	= 68719476736,
+};
+
+enum {
+	SSDFS_UNKNOWN_PAGE_TYPE,
+	SSDFS_USER_DATA_PAGES,
+	SSDFS_METADATA_PAGES,
+	SSDFS_PAGES_TYPE_MAX
+};
+
+#define SSDFS_INVALID_CNO	U64_MAX
+#define SSDFS_SECTOR_SHIFT	9
+#define SSDFS_DEFAULT_TIMEOUT	(msecs_to_jiffies(120000))
+#define SSDFS_NANOSECS_PER_SEC	(1000000000)
+#define SSDFS_SECS_PER_HOUR	(60 * 60)
+#define SSDFS_HOURS_PER_DAY	(24)
+#define SSDFS_DAYS_PER_WEEK	(7)
+#define SSDFS_WEEKS_PER_MONTH	(4)
+
+#endif /* _SSDFS_CONSTANTS_H */
diff --git a/fs/ssdfs/ssdfs_fs_info.h b/fs/ssdfs/ssdfs_fs_info.h
new file mode 100644
index 000000000000..18ba9c463af4
--- /dev/null
+++ b/fs/ssdfs/ssdfs_fs_info.h
@@ -0,0 +1,412 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ssdfs_fs_info.h - in-core fs information.
+ *
+ * Copyright (c) 2019-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_FS_INFO_H
+#define _SSDFS_FS_INFO_H
+
+/* Global FS states */
+enum {
+	SSDFS_UNKNOWN_GLOBAL_FS_STATE,
+	SSDFS_REGULAR_FS_OPERATIONS,
+	SSDFS_METADATA_GOING_FLUSHING,
+	SSDFS_METADATA_UNDER_FLUSH,
+	SSDFS_GLOBAL_FS_STATE_MAX
+};
+
+/*
+ * struct ssdfs_volume_block - logical block
+ * @seg_id: segment ID
+ * @blk_index: block index in segment
+ */
+struct ssdfs_volume_block {
+	u64 seg_id;
+	u16 blk_index;
+};
+
+/*
+ * struct ssdfs_volume_extent - logical extent
+ * @start: initial logical block
+ * @len: extent length
+ */
+struct ssdfs_volume_extent {
+	struct ssdfs_volume_block start;
+	u16 len;
+};
+
+/*
+ * struct ssdfs_peb_extent - PEB's extent
+ * @leb_id: LEB ID
+ * @peb_id: PEB ID
+ * @page_offset: offset in pages
+ * @pages_count: pages count
+ */
+struct ssdfs_peb_extent {
+	u64 leb_id;
+	u64 peb_id;
+	u32 page_offset;
+	u32 pages_count;
+};
+
+/*
+ * struct ssdfs_zone_fragment - zone fragment
+ * @ino: inode identification number
+ * @logical_blk_offset: logical offset from file's beginning in blocks
+ * @extent: zone fragment descriptor
+ */
+struct ssdfs_zone_fragment {
+	u64 ino;
+	u64 logical_blk_offset;
+	struct ssdfs_raw_extent extent;
+};
+
+/*
+ * struct ssdfs_metadata_options - metadata options
+ * @blk_bmap.flags: block bitmap's flags
+ * @blk_bmap.compression: compression type
+ *
+ * @blk2off_tbl.flags: offset transaltion table's flags
+ * @blk2off_tbl.compression: compression type
+ *
+ * @user_data.flags: user data's flags
+ * @user_data.compression: compression type
+ * @user_data.migration_threshold: default value of destination PEBs in migration
+ */
+struct ssdfs_metadata_options {
+	struct {
+		u16 flags;
+		u8 compression;
+	} blk_bmap;
+
+	struct {
+		u16 flags;
+		u8 compression;
+	} blk2off_tbl;
+
+	struct {
+		u16 flags;
+		u8 compression;
+		u16 migration_threshold;
+	} user_data;
+};
+
+/*
+ * struct ssdfs_sb_info - superblock info
+ * @vh_buf: volume header buffer
+ * @vh_buf_size: size of volume header buffer in bytes
+ * @vs_buf: volume state buffer
+ * @vs_buf_size: size of volume state buffer in bytes
+ * @last_log: latest sb log
+ */
+struct ssdfs_sb_info {
+	void *vh_buf;
+	size_t vh_buf_size;
+	void *vs_buf;
+	size_t vs_buf_size;
+	struct ssdfs_peb_extent last_log;
+};
+
+/*
+ * struct ssdfs_device_ops - device operations
+ * @device_name: get device name
+ * @device_size: get device size in bytes
+ * @open_zone: open zone
+ * @reopen_zone: reopen closed zone
+ * @close_zone: close zone
+ * @read: read from device
+ * @readpage: read page
+ * @readpages: read sequence of pages
+ * @can_write_page: can we write into page?
+ * @writepage: write page to device
+ * @writepages: write sequence of pages to device
+ * @erase: erase block
+ * @trim: support of background erase operation
+ * @peb_isbad: check that physical erase block is bad
+ * @sync: synchronize page cache with device
+ */
+struct ssdfs_device_ops {
+	const char * (*device_name)(struct super_block *sb);
+	__u64 (*device_size)(struct super_block *sb);
+	int (*open_zone)(struct super_block *sb, loff_t offset);
+	int (*reopen_zone)(struct super_block *sb, loff_t offset);
+	int (*close_zone)(struct super_block *sb, loff_t offset);
+	int (*read)(struct super_block *sb, loff_t offset, size_t len,
+		    void *buf);
+	int (*readpage)(struct super_block *sb, struct page *page,
+			loff_t offset);
+	int (*readpages)(struct super_block *sb, struct pagevec *pvec,
+			 loff_t offset);
+	int (*can_write_page)(struct super_block *sb, loff_t offset,
+				bool need_check);
+	int (*writepage)(struct super_block *sb, loff_t to_off,
+			 struct page *page, u32 from_off, size_t len);
+	int (*writepages)(struct super_block *sb, loff_t to_off,
+			  struct pagevec *pvec, u32 from_off, size_t len);
+	int (*erase)(struct super_block *sb, loff_t offset, size_t len);
+	int (*trim)(struct super_block *sb, loff_t offset, size_t len);
+	int (*peb_isbad)(struct super_block *sb, loff_t offset);
+	int (*mark_peb_bad)(struct super_block *sb, loff_t offset);
+	void (*sync)(struct super_block *sb);
+};
+
+/*
+ * struct ssdfs_snapshot_subsystem - snapshots subsystem
+ * @reqs_queue: snapshot requests queue
+ * @rules_list: snapshot rules list
+ * @tree: snapshots btree
+ */
+struct ssdfs_snapshot_subsystem {
+	struct ssdfs_snapshot_reqs_queue reqs_queue;
+	struct ssdfs_snapshot_rules_list rules_list;
+	struct ssdfs_snapshots_btree_info *tree;
+};
+
+/*
+ * struct ssdfs_fs_info - in-core fs information
+ * @log_pagesize: log2(page size)
+ * @pagesize: page size in bytes
+ * @log_erasesize: log2(erase block size)
+ * @erasesize: physical erase block size in bytes
+ * @log_segsize: log2(segment size)
+ * @segsize: segment size in bytes
+ * @log_pebs_per_seg: log2(erase blocks per segment)
+ * @pebs_per_seg: physical erase blocks per segment
+ * @pages_per_peb: pages per physical erase block
+ * @pages_per_seg: pages per segment
+ * @leb_pages_capacity: maximal number of logical blocks per LEB
+ * @peb_pages_capacity: maximal number of NAND pages can be written per PEB
+ * @lebs_per_peb_index: difference of LEB IDs between PEB indexes in segment
+ * @fs_ctime: volume create timestamp (mkfs phase)
+ * @fs_cno: volume create checkpoint
+ * @raw_inode_size: raw inode size in bytes
+ * @create_threads_per_seg: number of creation threads per segment
+ * @mount_opts: mount options
+ * @metadata_options: metadata options
+ * @volume_sem: volume semaphore
+ * @last_vh: buffer for last valid volume header
+ * @vh: volume header
+ * @vs: volume state
+ * @sbi: superblock info
+ * @sbi_backup: backup copy of superblock info
+ * @sb_seg_log_pages: full log size in sb segment (pages count)
+ * @segbmap_log_pages: full log size in segbmap segment (pages count)
+ * @maptbl_log_pages: full log size in maptbl segment (pages count)
+ * @lnodes_seg_log_pages: full log size in leaf nodes segment (pages count)
+ * @hnodes_seg_log_pages: full log size in hybrid nodes segment (pages count)
+ * @inodes_seg_log_pages: full log size in index nodes segment (pages count)
+ * @user_data_log_pages: full log size in user data segment (pages count)
+ * @volume_state_lock: lock for mutable volume metadata
+ * @free_pages: free pages count on the volume
+ * @reserved_new_user_data_pages: reserved pages of growing files' content
+ * @updated_user_data_pages: number of updated pages of files' content
+ * @flushing_user_data_requests: number of user data processing flush request
+ * @pending_wq: wait queue for flush threads of user data segments
+ * @finish_user_data_flush_wq: wait queue for waiting the end of user data flush
+ * @fs_mount_time: file system mount timestamp
+ * @fs_mod_time: last write timestamp
+ * @fs_mount_cno: mount checkpoint
+ * @boot_vs_mount_timediff: difference between boottime and mounttime
+ * @fs_flags: file system flags
+ * @fs_state: file system state
+ * @fs_errors: behaviour when detecting errors
+ * @fs_feature_compat: compatible feature set
+ * @fs_feature_compat_ro: read-only compatible feature set
+ * @fs_feature_incompat: incompatible feature set
+ * @fs_uuid: 128-bit volume's uuid
+ * @fs_label: volume name
+ * @migration_threshold: default value of destination PEBs in migration
+ * @resize_mutex: resize mutex
+ * @nsegs: number of segments on the volume
+ * @sb_segs_sem: semaphore for superblock's array of LEB/PEB numbers
+ * @sb_lebs: array of LEB ID numbers
+ * @sb_pebs: array of PEB ID numbers
+ * @segbmap: segment bitmap object
+ * @segbmap_inode: segment bitmap inode
+ * @maptbl: PEB mapping table object
+ * @maptbl_cache: maptbl cache
+ * @segs_tree: tree of segment objects
+ * @segs_tree_inode: segment tree inode
+ * @cur_segs: array of current segments
+ * @shextree: shared extents tree
+ * @shdictree: shared dictionary
+ * @inodes_tree: inodes btree
+ * @invextree: invalidated extents btree
+ * @snapshots: snapshots subsystem
+ * @gc_thread: array of GC threads
+ * @gc_wait_queue: array of GC threads' wait queues
+ * @gc_should_act: array of counters that define necessity of GC activity
+ * @flush_reqs: current number of flush requests
+ * @sb: pointer on VFS superblock object
+ * @mtd: MTD info
+ * @devops: device access operations
+ * @pending_bios: count of pending BIOs (dev_bdev.c ONLY)
+ * @erase_page: page with content for erase operation (dev_bdev.c ONLY)
+ * @is_zns_device: file system volume is on ZNS device
+ * @zone_size: zone size in bytes
+ * @zone_capacity: zone capacity in bytes available for write operations
+ * @max_open_zones: open zones limitation (upper bound)
+ * @open_zones: current number of opened zones
+ * @dev_kobj: /sys/fs/ssdfs/<device> kernel object
+ * @dev_kobj_unregister: completion state for <device> kernel object
+ * @maptbl_kobj: /sys/fs/<ssdfs>/<device>/maptbl kernel object
+ * @maptbl_kobj_unregister: completion state for maptbl kernel object
+ * @segbmap_kobj: /sys/fs/<ssdfs>/<device>/segbmap kernel object
+ * @segbmap_kobj_unregister: completion state for segbmap kernel object
+ * @segments_kobj: /sys/fs/<ssdfs>/<device>/segments kernel object
+ * @segments_kobj_unregister: completion state for segments kernel object
+ */
+struct ssdfs_fs_info {
+	u8 log_pagesize;
+	u32 pagesize;
+	u8 log_erasesize;
+	u32 erasesize;
+	u8 log_segsize;
+	u32 segsize;
+	u8 log_pebs_per_seg;
+	u32 pebs_per_seg;
+	u32 pages_per_peb;
+	u32 pages_per_seg;
+	u32 leb_pages_capacity;
+	u32 peb_pages_capacity;
+	u32 lebs_per_peb_index;
+	u64 fs_ctime;
+	u64 fs_cno;
+	u16 raw_inode_size;
+	u16 create_threads_per_seg;
+
+	unsigned long mount_opts;
+	struct ssdfs_metadata_options metadata_options;
+
+	struct rw_semaphore volume_sem;
+	struct ssdfs_volume_header last_vh;
+	struct ssdfs_volume_header *vh;
+	struct ssdfs_volume_state *vs;
+	struct ssdfs_sb_info sbi;
+	struct ssdfs_sb_info sbi_backup;
+	u16 sb_seg_log_pages;
+	u16 segbmap_log_pages;
+	u16 maptbl_log_pages;
+	u16 lnodes_seg_log_pages;
+	u16 hnodes_seg_log_pages;
+	u16 inodes_seg_log_pages;
+	u16 user_data_log_pages;
+
+	atomic_t global_fs_state;
+
+	spinlock_t volume_state_lock;
+	u64 free_pages;
+	u64 reserved_new_user_data_pages;
+	u64 updated_user_data_pages;
+	u64 flushing_user_data_requests;
+	wait_queue_head_t pending_wq;
+	wait_queue_head_t finish_user_data_flush_wq;
+	u64 fs_mount_time;
+	u64 fs_mod_time;
+	u64 fs_mount_cno;
+	u64 boot_vs_mount_timediff;
+	u32 fs_flags;
+	u16 fs_state;
+	u16 fs_errors;
+	u64 fs_feature_compat;
+	u64 fs_feature_compat_ro;
+	u64 fs_feature_incompat;
+	unsigned char fs_uuid[SSDFS_UUID_SIZE];
+	char fs_label[SSDFS_VOLUME_LABEL_MAX];
+	u16 migration_threshold;
+
+	struct mutex resize_mutex;
+	u64 nsegs;
+
+	struct rw_semaphore sb_segs_sem;
+	u64 sb_lebs[SSDFS_SB_CHAIN_MAX][SSDFS_SB_SEG_COPY_MAX];
+	u64 sb_pebs[SSDFS_SB_CHAIN_MAX][SSDFS_SB_SEG_COPY_MAX];
+
+	struct ssdfs_segment_bmap *segbmap;
+	struct inode *segbmap_inode;
+
+	struct ssdfs_peb_mapping_table *maptbl;
+	struct ssdfs_maptbl_cache maptbl_cache;
+
+	struct ssdfs_segment_tree *segs_tree;
+	struct inode *segs_tree_inode;
+
+	struct ssdfs_current_segs_array *cur_segs;
+
+	struct ssdfs_shared_extents_tree *shextree;
+	struct ssdfs_shared_dict_btree_info *shdictree;
+	struct ssdfs_inodes_btree_info *inodes_tree;
+	struct ssdfs_invextree_info *invextree;
+
+	struct ssdfs_snapshot_subsystem snapshots;
+
+	struct ssdfs_thread_info gc_thread[SSDFS_GC_THREAD_TYPE_MAX];
+	wait_queue_head_t gc_wait_queue[SSDFS_GC_THREAD_TYPE_MAX];
+	atomic_t gc_should_act[SSDFS_GC_THREAD_TYPE_MAX];
+	atomic64_t flush_reqs;
+
+	struct super_block *sb;
+
+	struct mtd_info *mtd;
+	const struct ssdfs_device_ops *devops;
+	atomic_t pending_bios;			/* for dev_bdev.c */
+	struct page *erase_page;		/* for dev_bdev.c */
+
+	bool is_zns_device;
+	u64 zone_size;
+	u64 zone_capacity;
+	u32 max_open_zones;
+	atomic_t open_zones;
+
+	/* /sys/fs/ssdfs/<device> */
+	struct kobject dev_kobj;
+	struct completion dev_kobj_unregister;
+
+	/* /sys/fs/<ssdfs>/<device>/maptbl */
+	struct kobject maptbl_kobj;
+	struct completion maptbl_kobj_unregister;
+
+	/* /sys/fs/<ssdfs>/<device>/segbmap */
+	struct kobject segbmap_kobj;
+	struct completion segbmap_kobj_unregister;
+
+	/* /sys/fs/<ssdfs>/<device>/segments */
+	struct kobject segments_kobj;
+	struct completion segments_kobj_unregister;
+
+#ifdef CONFIG_SSDFS_TESTING
+	struct address_space testing_pages;
+	struct inode *testing_inode;
+	bool do_fork_invalidation;
+#endif /* CONFIG_SSDFS_TESTING */
+};
+
+#define SSDFS_FS_I(sb) \
+	((struct ssdfs_fs_info *)(sb->s_fs_info))
+
+/*
+ * GC thread functions
+ */
+int ssdfs_using_seg_gc_thread_func(void *data);
+int ssdfs_used_seg_gc_thread_func(void *data);
+int ssdfs_pre_dirty_seg_gc_thread_func(void *data);
+int ssdfs_dirty_seg_gc_thread_func(void *data);
+int ssdfs_start_gc_thread(struct ssdfs_fs_info *fsi, int type);
+int ssdfs_stop_gc_thread(struct ssdfs_fs_info *fsi, int type);
+
+/*
+ * Device operations
+ */
+extern const struct ssdfs_device_ops ssdfs_mtd_devops;
+extern const struct ssdfs_device_ops ssdfs_bdev_devops;
+extern const struct ssdfs_device_ops ssdfs_zns_devops;
+
+#endif /* _SSDFS_FS_INFO_H */
diff --git a/fs/ssdfs/ssdfs_inline.h b/fs/ssdfs/ssdfs_inline.h
new file mode 100644
index 000000000000..9c416438b291
--- /dev/null
+++ b/fs/ssdfs/ssdfs_inline.h
@@ -0,0 +1,1346 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ssdfs_inline.h - inline functions and macros.
+ *
+ * Copyright (c) 2019-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_INLINE_H
+#define _SSDFS_INLINE_H
+
+#include <linux/slab.h>
+#include <linux/swap.h>
+
+#define SSDFS_CRIT(fmt, ...) \
+	pr_crit("pid %d:%s:%d %s(): " fmt, \
+		 current->pid, __FILE__, __LINE__, __func__, ##__VA_ARGS__)
+
+#define SSDFS_ERR(fmt, ...) \
+	pr_err("pid %d:%s:%d %s(): " fmt, \
+		 current->pid, __FILE__, __LINE__, __func__, ##__VA_ARGS__)
+
+#define SSDFS_WARN(fmt, ...) \
+	do { \
+		pr_warn("pid %d:%s:%d %s(): " fmt, \
+			current->pid, __FILE__, __LINE__, \
+			__func__, ##__VA_ARGS__); \
+		dump_stack(); \
+	} while (0)
+
+#define SSDFS_NOTICE(fmt, ...) \
+	pr_notice(fmt, ##__VA_ARGS__)
+
+#define SSDFS_INFO(fmt, ...) \
+	pr_info(fmt, ##__VA_ARGS__)
+
+#ifdef CONFIG_SSDFS_DEBUG
+
+#define SSDFS_DBG(fmt, ...) \
+	pr_debug("pid %d:%s:%d %s(): " fmt, \
+		 current->pid, __FILE__, __LINE__, __func__, ##__VA_ARGS__)
+
+#else /* CONFIG_SSDFS_DEBUG */
+
+#define SSDFS_DBG(fmt, ...) \
+	no_printk(KERN_DEBUG fmt, ##__VA_ARGS__)
+
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+extern atomic64_t ssdfs_allocated_pages;
+extern atomic64_t ssdfs_memory_leaks;
+
+extern atomic64_t ssdfs_locked_pages;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+static inline
+void ssdfs_memory_leaks_increment(void *kaddr)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_inc(&ssdfs_memory_leaks);
+
+	SSDFS_DBG("memory %p, allocation count %lld\n",
+		  kaddr,
+		  atomic64_read(&ssdfs_memory_leaks));
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+void ssdfs_memory_leaks_decrement(void *kaddr)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_dec(&ssdfs_memory_leaks);
+
+	SSDFS_DBG("memory %p, allocation count %lld\n",
+		  kaddr,
+		  atomic64_read(&ssdfs_memory_leaks));
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+void *ssdfs_kmalloc(size_t size, gfp_t flags)
+{
+	void *kaddr = kmalloc(size, flags);
+
+	if (kaddr)
+		ssdfs_memory_leaks_increment(kaddr);
+
+	return kaddr;
+}
+
+static inline
+void *ssdfs_kzalloc(size_t size, gfp_t flags)
+{
+	void *kaddr = kzalloc(size, flags);
+
+	if (kaddr)
+		ssdfs_memory_leaks_increment(kaddr);
+
+	return kaddr;
+}
+
+static inline
+void *ssdfs_kvzalloc(size_t size, gfp_t flags)
+{
+	void *kaddr = kvzalloc(size, flags);
+
+	if (kaddr)
+		ssdfs_memory_leaks_increment(kaddr);
+
+	return kaddr;
+}
+
+static inline
+void *ssdfs_kcalloc(size_t n, size_t size, gfp_t flags)
+{
+	void *kaddr = kcalloc(n, size, flags);
+
+	if (kaddr)
+		ssdfs_memory_leaks_increment(kaddr);
+
+	return kaddr;
+}
+
+static inline
+void ssdfs_kfree(void *kaddr)
+{
+	if (kaddr) {
+		ssdfs_memory_leaks_decrement(kaddr);
+		kfree(kaddr);
+	}
+}
+
+static inline
+void ssdfs_kvfree(void *kaddr)
+{
+	if (kaddr) {
+		ssdfs_memory_leaks_decrement(kaddr);
+		kvfree(kaddr);
+	}
+}
+
+static inline
+void ssdfs_get_page(struct page *page)
+{
+	get_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d, flags %#lx\n",
+		  page, page_ref_count(page), page->flags);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static inline
+void ssdfs_put_page(struct page *page)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	put_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (page_ref_count(page) < 1) {
+		SSDFS_WARN("page %p, count %d\n",
+			  page, page_ref_count(page));
+	}
+}
+
+static inline
+void ssdfs_lock_page(struct page *page)
+{
+	lock_page(page);
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_locked_pages) < 0) {
+		SSDFS_WARN("ssdfs_locked_pages %lld\n",
+			   atomic64_read(&ssdfs_locked_pages));
+	}
+
+	atomic64_inc(&ssdfs_locked_pages);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+void ssdfs_account_locked_page(struct page *page)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (!page)
+		return;
+
+	if (!PageLocked(page)) {
+		SSDFS_WARN("page %p, page_index %llu\n",
+			   page, (u64)page_index(page));
+	}
+
+	if (atomic64_read(&ssdfs_locked_pages) < 0) {
+		SSDFS_WARN("ssdfs_locked_pages %lld\n",
+			   atomic64_read(&ssdfs_locked_pages));
+	}
+
+	atomic64_inc(&ssdfs_locked_pages);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+void ssdfs_unlock_page(struct page *page)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (!PageLocked(page)) {
+		SSDFS_WARN("page %p, page_index %llu\n",
+			   page, (u64)page_index(page));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+	unlock_page(page);
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_dec(&ssdfs_locked_pages);
+
+	if (atomic64_read(&ssdfs_locked_pages) < 0) {
+		SSDFS_WARN("ssdfs_locked_pages %lld\n",
+			   atomic64_read(&ssdfs_locked_pages));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+struct page *ssdfs_alloc_page(gfp_t gfp_mask)
+{
+	struct page *page;
+
+	page = alloc_page(gfp_mask);
+	if (unlikely(!page)) {
+		SSDFS_ERR("unable to allocate memory page\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ssdfs_get_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d, "
+		  "flags %#lx, page_index %lu\n",
+		  page, page_ref_count(page),
+		  page->flags, page_index(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_inc(&ssdfs_allocated_pages);
+
+	SSDFS_DBG("page %p, allocated_pages %lld\n",
+		  page, atomic64_read(&ssdfs_allocated_pages));
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+	return page;
+}
+
+static inline
+void ssdfs_account_page(struct page *page)
+{
+	return;
+}
+
+static inline
+void ssdfs_forget_page(struct page *page)
+{
+	return;
+}
+
+/*
+ * ssdfs_add_pagevec_page() - add page into pagevec
+ * @pvec: pagevec
+ *
+ * This function adds empty page into pagevec.
+ *
+ * RETURN:
+ * [success] - pointer on added page.
+ * [failure] - error code:
+ *
+ * %-ENOMEM     - fail to allocate memory.
+ * %-E2BIG      - pagevec is full.
+ */
+static inline
+struct page *ssdfs_add_pagevec_page(struct pagevec *pvec)
+{
+	struct page *page;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pvec);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (pagevec_space(pvec) == 0) {
+		SSDFS_ERR("pagevec hasn't space\n");
+		return ERR_PTR(-E2BIG);
+	}
+
+	page = ssdfs_alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (IS_ERR_OR_NULL(page)) {
+		err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+		SSDFS_ERR("unable to allocate memory page\n");
+		return ERR_PTR(err);
+	}
+
+	pagevec_add(pvec, page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("pvec %p, pagevec count %u\n",
+		  pvec, pagevec_count(pvec));
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return page;
+}
+
+static inline
+void ssdfs_free_page(struct page *page)
+{
+	if (!page)
+		return;
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (PageLocked(page)) {
+		SSDFS_WARN("page %p is still locked\n",
+			   page);
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+	ssdfs_put_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (page_ref_count(page) <= 0 ||
+	    page_ref_count(page) > 1) {
+		SSDFS_WARN("page %p, count %d\n",
+			  page, page_ref_count(page));
+	}
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_dec(&ssdfs_allocated_pages);
+
+	SSDFS_DBG("page %p, allocated_pages %lld\n",
+		  page, atomic64_read(&ssdfs_allocated_pages));
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+	__free_pages(page, 0);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d, "
+		  "flags %#lx, page_index %lu\n",
+		  page, page_ref_count(page),
+		  page->flags, page_index(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static inline
+void ssdfs_pagevec_release(struct pagevec *pvec)
+{
+	int i;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("pvec %p\n", pvec);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!pvec)
+		return;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("pvec count %u\n", pagevec_count(pvec));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		struct page *page = pvec->pages[i];
+
+		if (!page)
+			continue;
+
+		ssdfs_free_page(page);
+
+		pvec->pages[i] = NULL;
+	}
+
+	pagevec_reinit(pvec);
+}
+
+#define SSDFS_MEMORY_LEAKS_CHECKER_FNS(name)				\
+static inline								\
+void ssdfs_##name##_cache_leaks_increment(void *kaddr)			\
+{									\
+	atomic64_inc(&ssdfs_##name##_cache_leaks);			\
+	SSDFS_DBG("memory %p, allocation count %lld\n",			\
+		  kaddr,						\
+		  atomic64_read(&ssdfs_##name##_cache_leaks));		\
+	ssdfs_memory_leaks_increment(kaddr);				\
+}									\
+static inline								\
+void ssdfs_##name##_cache_leaks_decrement(void *kaddr)			\
+{									\
+	atomic64_dec(&ssdfs_##name##_cache_leaks);			\
+	SSDFS_DBG("memory %p, allocation count %lld\n",			\
+		  kaddr,						\
+		  atomic64_read(&ssdfs_##name##_cache_leaks));		\
+	ssdfs_memory_leaks_decrement(kaddr);				\
+}									\
+static inline								\
+void *ssdfs_##name##_kmalloc(size_t size, gfp_t flags)			\
+{									\
+	void *kaddr = ssdfs_kmalloc(size, flags);			\
+	if (kaddr) {							\
+		atomic64_inc(&ssdfs_##name##_memory_leaks);		\
+		SSDFS_DBG("memory %p, allocation count %lld\n",		\
+			  kaddr,					\
+			  atomic64_read(&ssdfs_##name##_memory_leaks));	\
+	}								\
+	return kaddr;							\
+}									\
+static inline								\
+void *ssdfs_##name##_kzalloc(size_t size, gfp_t flags)			\
+{									\
+	void *kaddr = ssdfs_kzalloc(size, flags);			\
+	if (kaddr) {							\
+		atomic64_inc(&ssdfs_##name##_memory_leaks);		\
+		SSDFS_DBG("memory %p, allocation count %lld\n",		\
+			  kaddr,					\
+			  atomic64_read(&ssdfs_##name##_memory_leaks));	\
+	}								\
+	return kaddr;							\
+}									\
+static inline								\
+void *ssdfs_##name##_kvzalloc(size_t size, gfp_t flags)			\
+{									\
+	void *kaddr = ssdfs_kvzalloc(size, flags);			\
+	if (kaddr) {							\
+		atomic64_inc(&ssdfs_##name##_memory_leaks);		\
+		SSDFS_DBG("memory %p, allocation count %lld\n",		\
+			  kaddr,					\
+			  atomic64_read(&ssdfs_##name##_memory_leaks));	\
+	}								\
+	return kaddr;							\
+}									\
+static inline								\
+void *ssdfs_##name##_kcalloc(size_t n, size_t size, gfp_t flags)	\
+{									\
+	void *kaddr = ssdfs_kcalloc(n, size, flags);			\
+	if (kaddr) {							\
+		atomic64_inc(&ssdfs_##name##_memory_leaks);		\
+		SSDFS_DBG("memory %p, allocation count %lld\n",		\
+			  kaddr,					\
+			  atomic64_read(&ssdfs_##name##_memory_leaks));	\
+	}								\
+	return kaddr;							\
+}									\
+static inline								\
+void ssdfs_##name##_kfree(void *kaddr)					\
+{									\
+	if (kaddr) {							\
+		atomic64_dec(&ssdfs_##name##_memory_leaks);		\
+		SSDFS_DBG("memory %p, allocation count %lld\n",		\
+			  kaddr,					\
+			  atomic64_read(&ssdfs_##name##_memory_leaks));	\
+	}								\
+	ssdfs_kfree(kaddr);						\
+}									\
+static inline								\
+void ssdfs_##name##_kvfree(void *kaddr)					\
+{									\
+	if (kaddr) {							\
+		atomic64_dec(&ssdfs_##name##_memory_leaks);		\
+		SSDFS_DBG("memory %p, allocation count %lld\n",		\
+			  kaddr,					\
+			  atomic64_read(&ssdfs_##name##_memory_leaks));	\
+	}								\
+	ssdfs_kvfree(kaddr);						\
+}									\
+static inline								\
+struct page *ssdfs_##name##_alloc_page(gfp_t gfp_mask)			\
+{									\
+	struct page *page;						\
+	page = ssdfs_alloc_page(gfp_mask);				\
+	if (!IS_ERR_OR_NULL(page)) {					\
+		atomic64_inc(&ssdfs_##name##_page_leaks);		\
+		SSDFS_DBG("page %p, allocated_pages %lld\n",		\
+			  page,						\
+			  atomic64_read(&ssdfs_##name##_page_leaks));	\
+	}								\
+	return page;							\
+}									\
+static inline								\
+void ssdfs_##name##_account_page(struct page *page)			\
+{									\
+	if (page) {							\
+		atomic64_inc(&ssdfs_##name##_page_leaks);		\
+		SSDFS_DBG("page %p, allocated_pages %lld\n",		\
+			  page,						\
+			  atomic64_read(&ssdfs_##name##_page_leaks));	\
+	}								\
+}									\
+static inline								\
+void ssdfs_##name##_forget_page(struct page *page)			\
+{									\
+	if (page) {							\
+		atomic64_dec(&ssdfs_##name##_page_leaks);		\
+		SSDFS_DBG("page %p, allocated_pages %lld\n",		\
+			  page,						\
+			  atomic64_read(&ssdfs_##name##_page_leaks));	\
+	}								\
+}									\
+static inline								\
+struct page *ssdfs_##name##_add_pagevec_page(struct pagevec *pvec)	\
+{									\
+	struct page *page;						\
+	page = ssdfs_add_pagevec_page(pvec);				\
+	if (!IS_ERR_OR_NULL(page)) {					\
+		atomic64_inc(&ssdfs_##name##_page_leaks);		\
+		SSDFS_DBG("page %p, allocated_pages %lld\n",		\
+			  page,						\
+			  atomic64_read(&ssdfs_##name##_page_leaks));	\
+	}								\
+	return page;							\
+}									\
+static inline								\
+void ssdfs_##name##_free_page(struct page *page)			\
+{									\
+	if (page) {							\
+		atomic64_dec(&ssdfs_##name##_page_leaks);		\
+		SSDFS_DBG("page %p, allocated_pages %lld\n",		\
+			  page,						\
+			  atomic64_read(&ssdfs_##name##_page_leaks));	\
+	}								\
+	ssdfs_free_page(page);						\
+}									\
+static inline								\
+void ssdfs_##name##_pagevec_release(struct pagevec *pvec)		\
+{									\
+	int i;								\
+	if (pvec) {							\
+		for (i = 0; i < pagevec_count(pvec); i++) {		\
+			struct page *page = pvec->pages[i];		\
+			if (!page)					\
+				continue;				\
+			atomic64_dec(&ssdfs_##name##_page_leaks);	\
+			SSDFS_DBG("page %p, allocated_pages %lld\n",	\
+			    page,					\
+			    atomic64_read(&ssdfs_##name##_page_leaks));	\
+		}							\
+	}								\
+	ssdfs_pagevec_release(pvec);					\
+}									\
+
+#define SSDFS_MEMORY_ALLOCATOR_FNS(name)				\
+static inline								\
+void ssdfs_##name##_cache_leaks_increment(void *kaddr)			\
+{									\
+	ssdfs_memory_leaks_increment(kaddr);				\
+}									\
+static inline								\
+void ssdfs_##name##_cache_leaks_decrement(void *kaddr)			\
+{									\
+	ssdfs_memory_leaks_decrement(kaddr);				\
+}									\
+static inline								\
+void *ssdfs_##name##_kmalloc(size_t size, gfp_t flags)			\
+{									\
+	return ssdfs_kmalloc(size, flags);				\
+}									\
+static inline								\
+void *ssdfs_##name##_kzalloc(size_t size, gfp_t flags)			\
+{									\
+	return ssdfs_kzalloc(size, flags);				\
+}									\
+static inline								\
+void *ssdfs_##name##_kvzalloc(size_t size, gfp_t flags)			\
+{									\
+	return ssdfs_kvzalloc(size, flags);				\
+}									\
+static inline								\
+void *ssdfs_##name##_kcalloc(size_t n, size_t size, gfp_t flags)	\
+{									\
+	return ssdfs_kcalloc(n, size, flags);				\
+}									\
+static inline								\
+void ssdfs_##name##_kfree(void *kaddr)					\
+{									\
+	ssdfs_kfree(kaddr);						\
+}									\
+static inline								\
+void ssdfs_##name##_kvfree(void *kaddr)					\
+{									\
+	ssdfs_kvfree(kaddr);						\
+}									\
+static inline								\
+struct page *ssdfs_##name##_alloc_page(gfp_t gfp_mask)			\
+{									\
+	return ssdfs_alloc_page(gfp_mask);				\
+}									\
+static inline								\
+void ssdfs_##name##_account_page(struct page *page)			\
+{									\
+	ssdfs_account_page(page);					\
+}									\
+static inline								\
+void ssdfs_##name##_forget_page(struct page *page)			\
+{									\
+	ssdfs_forget_page(page);					\
+}									\
+static inline								\
+struct page *ssdfs_##name##_add_pagevec_page(struct pagevec *pvec)	\
+{									\
+	return ssdfs_add_pagevec_page(pvec);				\
+}									\
+static inline								\
+void ssdfs_##name##_free_page(struct page *page)			\
+{									\
+	ssdfs_free_page(page);						\
+}									\
+static inline								\
+void ssdfs_##name##_pagevec_release(struct pagevec *pvec)		\
+{									\
+	ssdfs_pagevec_release(pvec);					\
+}									\
+
+static inline
+__le32 ssdfs_crc32_le(void *data, size_t len)
+{
+	return cpu_to_le32(crc32(~0, data, len));
+}
+
+static inline
+int ssdfs_calculate_csum(struct ssdfs_metadata_check *check,
+			  void *buf, size_t buf_size)
+{
+	u16 bytes;
+	u16 flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!check || !buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	bytes = le16_to_cpu(check->bytes);
+	flags = le16_to_cpu(check->flags);
+
+	if (bytes > buf_size) {
+		SSDFS_ERR("corrupted size %d of checked data\n", bytes);
+		return -EINVAL;
+	}
+
+	if (flags & SSDFS_CRC32) {
+		check->csum = 0;
+		check->csum = ssdfs_crc32_le(buf, bytes);
+	} else {
+		SSDFS_ERR("unknown flags set %#x\n", flags);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static inline
+bool is_csum_valid(struct ssdfs_metadata_check *check,
+		   void *buf, size_t buf_size)
+{
+	__le32 old_csum;
+	__le32 calc_csum;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!check);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	old_csum = check->csum;
+
+	err = ssdfs_calculate_csum(check, buf, buf_size);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to calculate checksum\n");
+		return false;
+	}
+
+	calc_csum = check->csum;
+	check->csum = old_csum;
+
+	if (old_csum != calc_csum) {
+		SSDFS_ERR("old_csum %#x != calc_csum %#x\n",
+			  __le32_to_cpu(old_csum),
+			  __le32_to_cpu(calc_csum));
+		return false;
+	}
+
+	return true;
+}
+
+static inline
+bool is_ssdfs_magic_valid(struct ssdfs_signature *magic)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!magic);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (le32_to_cpu(magic->common) != SSDFS_SUPER_MAGIC)
+		return false;
+	if (magic->version.major > SSDFS_MAJOR_REVISION ||
+	    magic->version.minor > SSDFS_MINOR_REVISION) {
+		SSDFS_INFO("Volume has unsupported %u.%u version. "
+			   "Driver expects %u.%u version.\n",
+			   magic->version.major,
+			   magic->version.minor,
+			   SSDFS_MAJOR_REVISION,
+			   SSDFS_MINOR_REVISION);
+		return false;
+	}
+
+	return true;
+}
+
+#define SSDFS_SEG_HDR(ptr) \
+	((struct ssdfs_segment_header *)(ptr))
+#define SSDFS_LF(ptr) \
+	((struct ssdfs_log_footer *)(ptr))
+#define SSDFS_VH(ptr) \
+	((struct ssdfs_volume_header *)(ptr))
+#define SSDFS_VS(ptr) \
+	((struct ssdfs_volume_state *)(ptr))
+#define SSDFS_PLH(ptr) \
+	((struct ssdfs_partial_log_header *)(ptr))
+
+/*
+ * Flags for mount options.
+ */
+#define SSDFS_MOUNT_COMPR_MODE_NONE		(1 << 0)
+#define SSDFS_MOUNT_COMPR_MODE_ZLIB		(1 << 1)
+#define SSDFS_MOUNT_COMPR_MODE_LZO		(1 << 2)
+#define SSDFS_MOUNT_ERRORS_CONT			(1 << 3)
+#define SSDFS_MOUNT_ERRORS_RO			(1 << 4)
+#define SSDFS_MOUNT_ERRORS_PANIC		(1 << 5)
+#define SSDFS_MOUNT_IGNORE_FS_STATE		(1 << 6)
+
+#define ssdfs_clear_opt(o, opt)		((o) &= ~SSDFS_MOUNT_##opt)
+#define ssdfs_set_opt(o, opt)		((o) |= SSDFS_MOUNT_##opt)
+#define ssdfs_test_opt(o, opt)		((o) & SSDFS_MOUNT_##opt)
+
+#define SSDFS_LOG_FOOTER_OFF(seg_hdr)({ \
+	u32 offset; \
+	int index; \
+	struct ssdfs_metadata_descriptor *desc; \
+	index = SSDFS_LOG_FOOTER_INDEX; \
+	desc = &SSDFS_SEG_HDR(seg_hdr)->desc_array[index]; \
+	offset = le32_to_cpu(desc->offset); \
+	offset; \
+})
+
+#define SSDFS_LOG_PAGES(seg_hdr) \
+	(le16_to_cpu(SSDFS_SEG_HDR(seg_hdr)->log_pages))
+#define SSDFS_SEG_TYPE(seg_hdr) \
+	(le16_to_cpu(SSDFS_SEG_HDR(seg_hdr)->seg_type))
+
+#define SSDFS_MAIN_SB_PEB(vh, type) \
+	(le64_to_cpu(SSDFS_VH(vh)->sb_pebs[type][SSDFS_MAIN_SB_SEG].peb_id))
+#define SSDFS_COPY_SB_PEB(vh, type) \
+	(le64_to_cpu(SSDFS_VH(vh)->sb_pebs[type][SSDFS_COPY_SB_SEG].peb_id))
+#define SSDFS_MAIN_SB_LEB(vh, type) \
+	(le64_to_cpu(SSDFS_VH(vh)->sb_pebs[type][SSDFS_MAIN_SB_SEG].leb_id))
+#define SSDFS_COPY_SB_LEB(vh, type) \
+	(le64_to_cpu(SSDFS_VH(vh)->sb_pebs[type][SSDFS_COPY_SB_SEG].leb_id))
+
+#define SSDFS_SEG_CNO(seg_hdr) \
+	(le64_to_cpu(SSDFS_SEG_HDR(seg_hdr)->cno))
+
+static inline
+u64 ssdfs_current_timestamp(void)
+{
+	struct timespec64 cur_time;
+
+	ktime_get_coarse_real_ts64(&cur_time);
+
+	return (u64)timespec64_to_ns(&cur_time);
+}
+
+static inline
+void ssdfs_init_boot_vs_mount_timediff(struct ssdfs_fs_info *fsi)
+{
+	struct timespec64 uptime;
+
+	ktime_get_boottime_ts64(&uptime);
+	fsi->boot_vs_mount_timediff = timespec64_to_ns(&uptime);
+}
+
+static inline
+u64 ssdfs_current_cno(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct timespec64 uptime;
+	u64 boot_vs_mount_timediff;
+	u64 fs_mount_cno;
+
+	spin_lock(&fsi->volume_state_lock);
+	boot_vs_mount_timediff = fsi->boot_vs_mount_timediff;
+	fs_mount_cno = fsi->fs_mount_cno;
+	spin_unlock(&fsi->volume_state_lock);
+
+	ktime_get_boottime_ts64(&uptime);
+	return fs_mount_cno +
+		timespec64_to_ns(&uptime) -
+		boot_vs_mount_timediff;
+}
+
+#define SSDFS_MAPTBL_CACHE_HDR(ptr) \
+	((struct ssdfs_maptbl_cache_header *)(ptr))
+
+#define SSDFS_SEG_HDR_MAGIC(vh) \
+	(le16_to_cpu(SSDFS_VH(vh)->magic.key))
+#define SSDFS_SEG_TIME(seg_hdr) \
+	(le64_to_cpu(SSDFS_SEG_HDR(seg_hdr)->timestamp))
+
+#define SSDFS_VH_CNO(vh) \
+	(le64_to_cpu(SSDFS_VH(vh)->create_cno))
+#define SSDFS_VH_TIME(vh) \
+	(le64_to_cpu(SSDFS_VH(vh)->create_timestamp)
+
+#define SSDFS_VS_CNO(vs) \
+	(le64_to_cpu(SSDFS_VS(vs)->cno))
+#define SSDFS_VS_TIME(vs) \
+	(le64_to_cpu(SSDFS_VS(vs)->timestamp)
+
+#define SSDFS_POFFTH(ptr) \
+	((struct ssdfs_phys_offset_table_header *)(ptr))
+#define SSDFS_PHYSOFFD(ptr) \
+	((struct ssdfs_phys_offset_descriptor *)(ptr))
+
+static inline
+pgoff_t ssdfs_phys_page_to_mem_page(struct ssdfs_fs_info *fsi,
+				    pgoff_t index)
+{
+	if (fsi->log_pagesize == PAGE_SHIFT)
+		return index;
+	else if (fsi->log_pagesize > PAGE_SHIFT)
+		return index << (fsi->log_pagesize - PAGE_SHIFT);
+	else
+		return index >> (PAGE_SHIFT - fsi->log_pagesize);
+}
+
+static inline
+pgoff_t ssdfs_mem_page_to_phys_page(struct ssdfs_fs_info *fsi,
+				    pgoff_t index)
+{
+	if (fsi->log_pagesize == PAGE_SHIFT)
+		return index;
+	else if (fsi->log_pagesize > PAGE_SHIFT)
+		return index >> (fsi->log_pagesize - PAGE_SHIFT);
+	else
+		return index << (PAGE_SHIFT - fsi->log_pagesize);
+}
+
+#define SSDFS_MEMPAGE2BYTES(index) \
+	((pgoff_t)index << PAGE_SHIFT)
+#define SSDFS_BYTES2MEMPAGE(offset) \
+	((pgoff_t)offset >> PAGE_SHIFT)
+
+/*
+ * ssdfs_write_offset_to_mem_page_index() - convert write offset into mem page
+ * @fsi: pointer on shared file system object
+ * @start_page: index of log's start physical page
+ * @write_offset: offset in bytes from log's beginning
+ */
+static inline
+pgoff_t ssdfs_write_offset_to_mem_page_index(struct ssdfs_fs_info *fsi,
+					     u16 start_page,
+					     u32 write_offset)
+{
+	u32 page_off;
+
+	page_off = ssdfs_phys_page_to_mem_page(fsi, start_page);
+	page_off = SSDFS_MEMPAGE2BYTES(page_off) + write_offset;
+	return SSDFS_BYTES2MEMPAGE(page_off);
+}
+
+#define SSDFS_BLKBMP_HDR(ptr) \
+	((struct ssdfs_block_bitmap_header *)(ptr))
+#define SSDFS_SBMP_FRAG_HDR(ptr) \
+	((struct ssdfs_segbmap_fragment_header *)(ptr))
+#define SSDFS_BTN(ptr) \
+	((struct ssdfs_btree_node *)(ptr))
+
+static inline
+bool need_add_block(struct page *page)
+{
+	return PageChecked(page);
+}
+
+static inline
+bool is_diff_page(struct page *page)
+{
+	return PageChecked(page);
+}
+
+static inline
+void set_page_new(struct page *page)
+{
+	SetPageChecked(page);
+}
+
+static inline
+void clear_page_new(struct page *page)
+{
+	ClearPageChecked(page);
+}
+
+static
+inline void ssdfs_set_page_private(struct page *page,
+				   unsigned long private)
+{
+	set_page_private(page, private);
+	SetPagePrivate(page);
+}
+
+static
+inline void ssdfs_clear_page_private(struct page *page,
+				     unsigned long private)
+{
+	set_page_private(page, private);
+	ClearPagePrivate(page);
+}
+
+static inline
+bool can_be_merged_into_extent(struct page *page1, struct page *page2)
+{
+	ino_t ino1 = page1->mapping->host->i_ino;
+	ino_t ino2 = page2->mapping->host->i_ino;
+	pgoff_t index1 = page_index(page1);
+	pgoff_t index2 = page_index(page2);
+	pgoff_t diff_index;
+	bool has_identical_type;
+	bool has_identical_ino;
+
+	has_identical_type = (PageChecked(page1) && PageChecked(page2)) ||
+				(!PageChecked(page1) && !PageChecked(page2));
+	has_identical_ino = ino1 == ino2;
+
+	if (index1 >= index2)
+		diff_index = index1 - index2;
+	else
+		diff_index = index2 - index1;
+
+	return has_identical_type && has_identical_ino && (diff_index == 1);
+}
+
+static inline
+int ssdfs_memcpy(void *dst, u32 dst_off, u32 dst_size,
+		 const void *src, u32 src_off, u32 src_size,
+		 u32 copy_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	if ((src_off + copy_size) > src_size) {
+		SSDFS_ERR("fail to copy: "
+			  "src_off %u, copy_size %u, src_size %u\n",
+			  src_off, copy_size, src_size);
+		return -ERANGE;
+	}
+
+	if ((dst_off + copy_size) > dst_size) {
+		SSDFS_ERR("fail to copy: "
+			  "dst_off %u, copy_size %u, dst_size %u\n",
+			  dst_off, copy_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("dst %p, dst_off %u, dst_size %u, "
+		  "src %p, src_off %u, src_size %u, "
+		  "copy_size %u\n",
+		  dst, dst_off, dst_size,
+		  src, src_off, src_size,
+		  copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	memcpy((u8 *)dst + dst_off, (u8 *)src + src_off, copy_size);
+	return 0;
+}
+
+static inline
+int ssdfs_memcpy_page(struct page *dst_page, u32 dst_off, u32 dst_size,
+		      struct page *src_page, u32 src_off, u32 src_size,
+		      u32 copy_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	if ((src_off + copy_size) > src_size) {
+		SSDFS_ERR("fail to copy: "
+			  "src_off %u, copy_size %u, src_size %u\n",
+			  src_off, copy_size, src_size);
+		return -ERANGE;
+	}
+
+	if ((dst_off + copy_size) > dst_size) {
+		SSDFS_ERR("fail to copy: "
+			  "dst_off %u, copy_size %u, dst_size %u\n",
+			  dst_off, copy_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("dst_page %p, dst_off %u, dst_size %u, "
+		  "src_page %p, src_off %u, src_size %u, "
+		  "copy_size %u\n",
+		  dst_page, dst_off, dst_size,
+		  src_page, src_off, src_size,
+		  copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	memcpy_page(dst_page, dst_off, src_page, src_off, copy_size);
+	return 0;
+}
+
+static inline
+int ssdfs_memcpy_from_page(void *dst, u32 dst_off, u32 dst_size,
+			   struct page *page, u32 src_off, u32 src_size,
+			   u32 copy_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	if ((src_off + copy_size) > src_size) {
+		SSDFS_ERR("fail to copy: "
+			  "src_off %u, copy_size %u, src_size %u\n",
+			  src_off, copy_size, src_size);
+		return -ERANGE;
+	}
+
+	if ((dst_off + copy_size) > dst_size) {
+		SSDFS_ERR("fail to copy: "
+			  "dst_off %u, copy_size %u, dst_size %u\n",
+			  dst_off, copy_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("dst %p, dst_off %u, dst_size %u, "
+		  "page %p, src_off %u, src_size %u, "
+		  "copy_size %u\n",
+		  dst, dst_off, dst_size,
+		  page, src_off, src_size,
+		  copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	memcpy_from_page((u8 *)dst + dst_off, page, src_off, copy_size);
+	return 0;
+}
+
+static inline
+int ssdfs_memcpy_to_page(struct page *page, u32 dst_off, u32 dst_size,
+			 void *src, u32 src_off, u32 src_size,
+			 u32 copy_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	if ((src_off + copy_size) > src_size) {
+		SSDFS_ERR("fail to copy: "
+			  "src_off %u, copy_size %u, src_size %u\n",
+			  src_off, copy_size, src_size);
+		return -ERANGE;
+	}
+
+	if ((dst_off + copy_size) > dst_size) {
+		SSDFS_ERR("fail to copy: "
+			  "dst_off %u, copy_size %u, dst_size %u\n",
+			  dst_off, copy_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("page %p, dst_off %u, dst_size %u, "
+		  "src %p, src_off %u, src_size %u, "
+		  "copy_size %u\n",
+		  page, dst_off, dst_size,
+		  src, src_off, src_size,
+		  copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	memcpy_to_page(page, dst_off, (u8 *)src + src_off, copy_size);
+	return 0;
+}
+
+static inline
+int ssdfs_memmove(void *dst, u32 dst_off, u32 dst_size,
+		  const void *src, u32 src_off, u32 src_size,
+		  u32 move_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	if ((src_off + move_size) > src_size) {
+		SSDFS_ERR("fail to move: "
+			  "src_off %u, move_size %u, src_size %u\n",
+			  src_off, move_size, src_size);
+		return -ERANGE;
+	}
+
+	if ((dst_off + move_size) > dst_size) {
+		SSDFS_ERR("fail to move: "
+			  "dst_off %u, move_size %u, dst_size %u\n",
+			  dst_off, move_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("dst %p, dst_off %u, dst_size %u, "
+		  "src %p, src_off %u, src_size %u, "
+		  "move_size %u\n",
+		  dst, dst_off, dst_size,
+		  src, src_off, src_size,
+		  move_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	memmove((u8 *)dst + dst_off, (u8 *)src + src_off, move_size);
+	return 0;
+}
+
+static inline
+int ssdfs_memmove_page(struct page *dst_page, u32 dst_off, u32 dst_size,
+			struct page *src_page, u32 src_off, u32 src_size,
+			u32 move_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	if ((src_off + move_size) > src_size) {
+		SSDFS_ERR("fail to move: "
+			  "src_off %u, move_size %u, src_size %u\n",
+			  src_off, move_size, src_size);
+		return -ERANGE;
+	}
+
+	if ((dst_off + move_size) > dst_size) {
+		SSDFS_ERR("fail to move: "
+			  "dst_off %u, move_size %u, dst_size %u\n",
+			  dst_off, move_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("dst_page %p, dst_off %u, dst_size %u, "
+		  "src_page %p, src_off %u, src_size %u, "
+		  "move_size %u\n",
+		  dst_page, dst_off, dst_size,
+		  src_page, src_off, src_size,
+		  move_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	memcpy_page(dst_page, dst_off, src_page, src_off, move_size);
+	return 0;
+}
+
+static inline
+int ssdfs_memset_page(struct page *page, u32 dst_off, u32 dst_size,
+		      int value, u32 set_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	if ((dst_off + set_size) > dst_size) {
+		SSDFS_ERR("fail to copy: "
+			  "dst_off %u, set_size %u, dst_size %u\n",
+			  dst_off, set_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("page %p, dst_off %u, dst_size %u, "
+		  "value %#x, set_size %u\n",
+		  page, dst_off, dst_size,
+		  value, set_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	memset_page(page, dst_off, value, set_size);
+	return 0;
+}
+
+static inline
+int ssdfs_memzero_page(struct page *page, u32 dst_off, u32 dst_size,
+		       u32 set_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	if ((dst_off + set_size) > dst_size) {
+		SSDFS_ERR("fail to copy: "
+			  "dst_off %u, set_size %u, dst_size %u\n",
+			  dst_off, set_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("page %p, dst_off %u, dst_size %u, "
+		  "set_size %u\n",
+		  page, dst_off, dst_size, set_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	memzero_page(page, dst_off, set_size);
+	return 0;
+}
+
+static inline
+bool is_ssdfs_file_inline(struct ssdfs_inode_info *ii)
+{
+	return atomic_read(&ii->private_flags) & SSDFS_INODE_HAS_INLINE_FILE;
+}
+
+static inline
+size_t ssdfs_inode_inline_file_capacity(struct inode *inode)
+{
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	size_t raw_inode_size;
+	size_t metadata_len;
+
+	raw_inode_size = ii->raw_inode_size;
+	metadata_len = offsetof(struct ssdfs_inode, internal);
+
+	if (raw_inode_size <= metadata_len) {
+		SSDFS_ERR("corrupted raw inode: "
+			  "raw_inode_size %zu, metadata_len %zu\n",
+			  raw_inode_size, metadata_len);
+		return 0;
+	}
+
+	return raw_inode_size - metadata_len;
+}
+
+/*
+ * __ssdfs_generate_name_hash() - generate a name's hash
+ * @name: pointer on the name's string
+ * @len: length of the name
+ * @inline_name_max_len: max length of inline name
+ */
+static inline
+u64 __ssdfs_generate_name_hash(const char *name, size_t len,
+				size_t inline_name_max_len)
+{
+	u32 hash32_lo, hash32_hi;
+	size_t copy_len;
+	u64 name_hash;
+	u32 diff = 0;
+	u8 symbol1, symbol2;
+	int i;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!name);
+
+	SSDFS_DBG("name %s, len %zu, inline_name_max_len %zu\n",
+		  name, len, inline_name_max_len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (len == 0) {
+		SSDFS_ERR("invalid len %zu\n", len);
+		return U64_MAX;
+	}
+
+	copy_len = min_t(size_t, len, inline_name_max_len);
+	hash32_lo = full_name_hash(NULL, name, copy_len);
+
+	if (len <= inline_name_max_len) {
+		hash32_hi = len;
+
+		for (i = 1; i < len; i++) {
+			symbol1 = (u8)name[i - 1];
+			symbol2 = (u8)name[i];
+			diff = 0;
+
+			if (symbol1 > symbol2)
+				diff = symbol1 - symbol2;
+			else
+				diff = symbol2 - symbol1;
+
+			hash32_hi += diff * symbol1;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("hash32_hi %x, symbol1 %x, "
+				  "symbol2 %x, index %d, diff %u\n",
+				  hash32_hi, symbol1, symbol2,
+				  i, diff);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+	} else {
+		hash32_hi = full_name_hash(NULL,
+					   name + inline_name_max_len,
+					   len - copy_len);
+	}
+
+	name_hash = SSDFS_NAME_HASH(hash32_lo, hash32_hi);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("name %s, len %zu, name_hash %llx\n",
+		  name, len, name_hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return name_hash;
+}
+
+#define SSDFS_LOG_FOOTER_OFF(seg_hdr)({ \
+	u32 offset; \
+	int index; \
+	struct ssdfs_metadata_descriptor *desc; \
+	index = SSDFS_LOG_FOOTER_INDEX; \
+	desc = &SSDFS_SEG_HDR(seg_hdr)->desc_array[index]; \
+	offset = le32_to_cpu(desc->offset); \
+	offset; \
+})
+
+#define SSDFS_WAITED_TOO_LONG_MSECS		(1000)
+
+static inline
+void ssdfs_check_jiffies_left_till_timeout(unsigned long value)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	unsigned int msecs;
+
+	msecs = jiffies_to_msecs(SSDFS_DEFAULT_TIMEOUT - value);
+	if (msecs >= SSDFS_WAITED_TOO_LONG_MSECS)
+		SSDFS_ERR("function waited %u msecs\n", msecs);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+#define SSDFS_WAIT_COMPLETION(end)({ \
+	unsigned long res; \
+	int err = 0; \
+	res = wait_for_completion_timeout(end, SSDFS_DEFAULT_TIMEOUT); \
+	if (res == 0) { \
+		err = -ERANGE; \
+	} else { \
+		ssdfs_check_jiffies_left_till_timeout(res); \
+	} \
+	err; \
+})
+
+#define SSDFS_FSI(ptr) \
+	((struct ssdfs_fs_info *)(ptr))
+#define SSDFS_BLKT(ptr) \
+	((struct ssdfs_area_block_table *)(ptr))
+#define SSDFS_FRAGD(ptr) \
+	((struct ssdfs_fragment_desc *)(ptr))
+#define SSDFS_BLKD(ptr) \
+	((struct ssdfs_block_descriptor *)(ptr))
+#define SSDFS_BLKSTOFF(ptr) \
+	((struct ssdfs_blk_state_offset *)(ptr))
+#define SSDFS_STNODE_HDR(ptr) \
+	((struct ssdfs_segment_tree_node_header *)(ptr))
+#define SSDFS_SNRU_HDR(ptr) \
+	((struct ssdfs_snapshot_rules_header *)(ptr))
+#define SSDFS_SNRU_INFO(ptr) \
+	((struct ssdfs_snapshot_rule_info *)(ptr))
+
+#define SSDFS_LEB2SEG(fsi, leb) \
+	((u64)ssdfs_get_seg_id_for_leb_id(fsi, leb))
+
+#endif /* _SSDFS_INLINE_H */
diff --git a/fs/ssdfs/ssdfs_inode_info.h b/fs/ssdfs/ssdfs_inode_info.h
new file mode 100644
index 000000000000..5e98f4fa3672
--- /dev/null
+++ b/fs/ssdfs/ssdfs_inode_info.h
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ssdfs_inode_info.h - SSDFS in-core inode.
+ *
+ * Copyright (c) 2019-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_INODE_INFO_H
+#define _SSDFS_INODE_INFO_H
+
+/*
+ * Inode flags (GETFLAGS/SETFLAGS)
+ */
+#define	SSDFS_SECRM_FL			FS_SECRM_FL	/* Secure deletion */
+#define	SSDFS_UNRM_FL			FS_UNRM_FL	/* Undelete */
+#define	SSDFS_COMPR_FL			FS_COMPR_FL	/* Compress file */
+#define SSDFS_SYNC_FL			FS_SYNC_FL	/* Synchronous updates */
+#define SSDFS_IMMUTABLE_FL		FS_IMMUTABLE_FL	/* Immutable file */
+#define SSDFS_APPEND_FL			FS_APPEND_FL	/* writes to file may only append */
+#define SSDFS_NODUMP_FL			FS_NODUMP_FL	/* do not dump file */
+#define SSDFS_NOATIME_FL		FS_NOATIME_FL	/* do not update atime */
+/* Reserved for compression usage... */
+#define SSDFS_DIRTY_FL			FS_DIRTY_FL
+#define SSDFS_COMPRBLK_FL		FS_COMPRBLK_FL	/* One or more compressed clusters */
+#define SSDFS_NOCOMP_FL			FS_NOCOMP_FL	/* Don't compress */
+#define SSDFS_ECOMPR_FL			FS_ECOMPR_FL	/* Compression error */
+/* End compression flags --- maybe not all used */
+#define SSDFS_BTREE_FL			FS_BTREE_FL	/* btree format dir */
+#define SSDFS_INDEX_FL			FS_INDEX_FL	/* hash-indexed directory */
+#define SSDFS_IMAGIC_FL			FS_IMAGIC_FL	/* AFS directory */
+#define SSDFS_JOURNAL_DATA_FL		FS_JOURNAL_DATA_FL /* Reserved for ext3 */
+#define SSDFS_NOTAIL_FL			FS_NOTAIL_FL	/* file tail should not be merged */
+#define SSDFS_DIRSYNC_FL		FS_DIRSYNC_FL	/* dirsync behaviour (directories only) */
+#define SSDFS_TOPDIR_FL			FS_TOPDIR_FL	/* Top of directory hierarchies*/
+#define SSDFS_RESERVED_FL		FS_RESERVED_FL	/* reserved for ext2 lib */
+
+#define SSDFS_FL_USER_VISIBLE		FS_FL_USER_VISIBLE	/* User visible flags */
+#define SSDFS_FL_USER_MODIFIABLE	FS_FL_USER_MODIFIABLE	/* User modifiable flags */
+
+/* Flags that should be inherited by new inodes from their parent. */
+#define SSDFS_FL_INHERITED (SSDFS_SECRM_FL | SSDFS_UNRM_FL | SSDFS_COMPR_FL |\
+			   SSDFS_SYNC_FL | SSDFS_NODUMP_FL |\
+			   SSDFS_NOATIME_FL | SSDFS_COMPRBLK_FL |\
+			   SSDFS_NOCOMP_FL | SSDFS_JOURNAL_DATA_FL |\
+			   SSDFS_NOTAIL_FL | SSDFS_DIRSYNC_FL)
+
+/* Flags that are appropriate for regular files (all but dir-specific ones). */
+#define SSDFS_REG_FLMASK (~(SSDFS_DIRSYNC_FL | SSDFS_TOPDIR_FL))
+
+/* Flags that are appropriate for non-directories/regular files. */
+#define SSDFS_OTHER_FLMASK (SSDFS_NODUMP_FL | SSDFS_NOATIME_FL)
+
+/* Mask out flags that are inappropriate for the given type of inode. */
+static inline __u32 ssdfs_mask_flags(umode_t mode, __u32 flags)
+{
+	if (S_ISDIR(mode))
+		return flags;
+	else if (S_ISREG(mode))
+		return flags & SSDFS_REG_FLMASK;
+	else
+		return flags & SSDFS_OTHER_FLMASK;
+}
+
+/*
+ * struct ssdfs_inode_info - in-core inode
+ * @vfs_inode: VFS inode object
+ * @birthtime: creation time
+ * @raw_inode_size: raw inode size in bytes
+ * @private_flags: inode's private flags
+ * @lock: inode lock
+ * @parent_ino: parent inode ID
+ * @flags: inode flags
+ * @name_hash: name's hash code
+ * @name_len: name length
+ * @extents_tree: extents btree
+ * @dentries_tree: dentries btree
+ * @xattrs_tree: extended attributes tree
+ * @inline_file: inline file buffer
+ * @raw_inode: raw inode
+ */
+struct ssdfs_inode_info {
+	struct inode vfs_inode;
+	struct timespec64 birthtime;
+	u16 raw_inode_size;
+
+	atomic_t private_flags;
+
+	struct rw_semaphore lock;
+	u64 parent_ino;
+	u32 flags;
+	u64 name_hash;
+	u16 name_len;
+	struct ssdfs_extents_btree_info *extents_tree;
+	struct ssdfs_dentries_btree_info *dentries_tree;
+	struct ssdfs_xattrs_btree_info *xattrs_tree;
+	void *inline_file;
+	struct ssdfs_inode raw_inode;
+};
+
+static inline struct ssdfs_inode_info *SSDFS_I(struct inode *inode)
+{
+	return container_of(inode, struct ssdfs_inode_info, vfs_inode);
+}
+
+static inline
+struct ssdfs_extents_btree_info *SSDFS_EXTREE(struct ssdfs_inode_info *ii)
+{
+	if (S_ISDIR(ii->vfs_inode.i_mode))
+		return NULL;
+	else
+		return ii->extents_tree;
+}
+
+static inline
+struct ssdfs_dentries_btree_info *SSDFS_DTREE(struct ssdfs_inode_info *ii)
+{
+	if (S_ISDIR(ii->vfs_inode.i_mode))
+		return ii->dentries_tree;
+	else
+		return NULL;
+}
+
+static inline
+struct ssdfs_xattrs_btree_info *SSDFS_XATTREE(struct ssdfs_inode_info *ii)
+{
+	return ii->xattrs_tree;
+}
+
+extern const struct file_operations ssdfs_dir_operations;
+extern const struct inode_operations ssdfs_dir_inode_operations;
+extern const struct file_operations ssdfs_file_operations;
+extern const struct inode_operations ssdfs_file_inode_operations;
+extern const struct address_space_operations ssdfs_aops;
+extern const struct inode_operations ssdfs_special_inode_operations;
+extern const struct inode_operations ssdfs_symlink_inode_operations;
+
+#endif /* _SSDFS_INODE_INFO_H */
diff --git a/fs/ssdfs/ssdfs_thread_info.h b/fs/ssdfs/ssdfs_thread_info.h
new file mode 100644
index 000000000000..2816a50e18e4
--- /dev/null
+++ b/fs/ssdfs/ssdfs_thread_info.h
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ssdfs_thread_info.h - thread declarations.
+ *
+ * Copyright (c) 2019-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_THREAD_INFO_H
+#define _SSDFS_THREAD_INFO_H
+
+/*
+ * struct ssdfs_thread_info - thread info
+ * @task: task descriptor
+ * @wait: wait queue
+ * @full_stop: ending of thread's activity
+ */
+struct ssdfs_thread_info {
+	struct task_struct *task;
+	struct wait_queue_entry wait;
+	struct completion full_stop;
+};
+
+/* function prototype */
+typedef int (*ssdfs_threadfn)(void *data);
+
+/*
+ * struct ssdfs_thread_descriptor - thread descriptor
+ * @threadfn: thread's function
+ * @fmt: thread's name format
+ */
+struct ssdfs_thread_descriptor {
+	ssdfs_threadfn threadfn;
+	const char *fmt;
+};
+
+#endif /* _SSDFS_THREAD_INFO_H */
diff --git a/fs/ssdfs/version.h b/fs/ssdfs/version.h
new file mode 100644
index 000000000000..5231f8a1f575
--- /dev/null
+++ b/fs/ssdfs/version.h
@@ -0,0 +1,7 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+#ifndef _SSDFS_VERSION_H
+#define _SSDFS_VERSION_H
+
+#define SSDFS_VERSION "SSDFS v.4.42"
+
+#endif /* _SSDFS_VERSION_H */
diff --git a/include/trace/events/ssdfs.h b/include/trace/events/ssdfs.h
new file mode 100644
index 000000000000..dbf117dccd28
--- /dev/null
+++ b/include/trace/events/ssdfs.h
@@ -0,0 +1,255 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * include/trace/events/ssdfs.h - definition of tracepoints.
+ *
+ * Copyright (c) 2019-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ssdfs
+
+#if !defined(_TRACE_SSDFS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_SSDFS_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(ssdfs__inode,
+
+	TP_PROTO(struct inode *inode),
+
+	TP_ARGS(inode),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(umode_t, mode)
+		__field(loff_t,	size)
+		__field(unsigned int, nlink)
+		__field(blkcnt_t, blocks)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= inode->i_sb->s_dev;
+		__entry->ino	= inode->i_ino;
+		__entry->mode	= inode->i_mode;
+		__entry->nlink	= inode->i_nlink;
+		__entry->size	= inode->i_size;
+		__entry->blocks	= inode->i_blocks;
+	),
+
+	TP_printk("dev = (%d,%d), ino = %lu, i_mode = 0x%hx, "
+		"i_size = %lld, i_nlink = %u, i_blocks = %llu",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		(unsigned long)__entry->ino,
+		__entry->mode,
+		__entry->size,
+		(unsigned int)__entry->nlink,
+		(unsigned long long)__entry->blocks)
+);
+
+DECLARE_EVENT_CLASS(ssdfs__inode_exit,
+
+	TP_PROTO(struct inode *inode, int ret),
+
+	TP_ARGS(inode, ret),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(int,	ret)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= inode->i_sb->s_dev;
+		__entry->ino	= inode->i_ino;
+		__entry->ret	= ret;
+	),
+
+	TP_printk("dev = (%d,%d), ino = %lu, ret = %d",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		(unsigned long)__entry->ino,
+		__entry->ret)
+);
+
+DEFINE_EVENT(ssdfs__inode, ssdfs_inode_new,
+
+	TP_PROTO(struct inode *inode),
+
+	TP_ARGS(inode)
+);
+
+DEFINE_EVENT(ssdfs__inode_exit, ssdfs_inode_new_exit,
+
+	TP_PROTO(struct inode *inode, int ret),
+
+	TP_ARGS(inode, ret)
+);
+
+DEFINE_EVENT(ssdfs__inode, ssdfs_inode_request,
+
+	TP_PROTO(struct inode *inode),
+
+	TP_ARGS(inode)
+);
+
+DEFINE_EVENT(ssdfs__inode, ssdfs_inode_evict,
+
+	TP_PROTO(struct inode *inode),
+
+	TP_ARGS(inode)
+);
+
+DEFINE_EVENT(ssdfs__inode, ssdfs_iget,
+
+	TP_PROTO(struct inode *inode),
+
+	TP_ARGS(inode)
+);
+
+DEFINE_EVENT(ssdfs__inode_exit, ssdfs_iget_exit,
+
+	TP_PROTO(struct inode *inode, int ret),
+
+	TP_ARGS(inode, ret)
+);
+
+TRACE_EVENT(ssdfs_sync_fs,
+
+	TP_PROTO(struct super_block *sb, int wait),
+
+	TP_ARGS(sb, wait),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(int,	wait)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= sb->s_dev;
+		__entry->wait	= wait;
+	),
+
+	TP_printk("dev = (%d,%d), wait = %d",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		__entry->wait)
+);
+
+TRACE_EVENT(ssdfs_sync_fs_exit,
+
+	TP_PROTO(struct super_block *sb, int wait, int ret),
+
+	TP_ARGS(sb, wait, ret),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(int,	wait)
+		__field(int,	ret)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= sb->s_dev;
+		__entry->wait	= wait;
+		__entry->ret	= ret;
+	),
+
+	TP_printk("dev = (%d,%d), wait = %d, ret = %d",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		__entry->wait,
+		__entry->ret)
+);
+
+DEFINE_EVENT(ssdfs__inode, ssdfs_sync_file_enter,
+
+	TP_PROTO(struct inode *inode),
+
+	TP_ARGS(inode)
+);
+
+TRACE_EVENT(ssdfs_sync_file_exit,
+
+	TP_PROTO(struct file *file, int datasync, int ret),
+
+	TP_ARGS(file, datasync, ret),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(ino_t,	parent)
+		__field(int,	datasync)
+		__field(int,	ret)
+	),
+
+	TP_fast_assign(
+		struct dentry *dentry = file->f_path.dentry;
+		struct inode *inode = dentry->d_inode;
+
+		__entry->dev		= inode->i_sb->s_dev;
+		__entry->ino		= inode->i_ino;
+		__entry->parent		= dentry->d_parent->d_inode->i_ino;
+		__entry->datasync	= datasync;
+		__entry->ret		= ret;
+	),
+
+	TP_printk("dev = (%d,%d), ino = %lu, parent = %ld, "
+		"datasync = %d, ret = %d",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		(unsigned long)__entry->ino,
+		(unsigned long)__entry->parent,
+		__entry->datasync,
+		__entry->ret)
+);
+
+TRACE_EVENT(ssdfs_unlink_enter,
+
+	TP_PROTO(struct inode *dir, struct dentry *dentry),
+
+	TP_ARGS(dir, dentry),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(loff_t,	size)
+		__field(blkcnt_t, blocks)
+		__field(const char *,	name)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= dir->i_sb->s_dev;
+		__entry->ino	= dir->i_ino;
+		__entry->size	= dir->i_size;
+		__entry->blocks	= dir->i_blocks;
+		__entry->name	= dentry->d_name.name;
+	),
+
+	TP_printk("dev = (%d,%d), dir ino = %lu, i_size = %lld, "
+		"i_blocks = %llu, name = %s",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		(unsigned long)__entry->ino,
+		__entry->size,
+		(unsigned long long)__entry->blocks,
+		__entry->name)
+);
+
+DEFINE_EVENT(ssdfs__inode_exit, ssdfs_unlink_exit,
+
+	TP_PROTO(struct inode *inode, int ret),
+
+	TP_ARGS(inode, ret)
+);
+
+#endif /* _TRACE_SSDFS_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/ssdfs_fs.h b/include/uapi/linux/ssdfs_fs.h
new file mode 100644
index 000000000000..50c81751afc9
--- /dev/null
+++ b/include/uapi/linux/ssdfs_fs.h
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * include/uapi/linux/ssdfs_fs.h - SSDFS common declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _UAPI_LINUX_SSDFS_H
+#define _UAPI_LINUX_SSDFS_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/* SSDFS magic signatures */
+#define SSDFS_SUPER_MAGIC			0x53734466	/* SsDf */
+#define SSDFS_SEGMENT_HDR_MAGIC			0x5348		/* SH */
+#define SSDFS_LOG_FOOTER_MAGIC			0x4C46		/* LF */
+#define SSDFS_PARTIAL_LOG_HDR_MAGIC		0x5048		/* PH */
+#define SSDFS_BLK_BMAP_MAGIC			0x424D		/* BM */
+#define SSDFS_FRAGMENT_DESC_MAGIC		0x66		/* f */
+#define SSDFS_CHAIN_HDR_MAGIC			0x63		/* c */
+#define SSDFS_PHYS_OFF_TABLE_MAGIC		0x504F5448	/* POTH */
+#define SSDFS_BLK2OFF_TABLE_HDR_MAGIC		0x5474		/* Tt */
+#define SSDFS_SEGBMAP_HDR_MAGIC			0x534D		/* SM */
+#define SSDFS_INODE_MAGIC			0x6469		/* di */
+#define SSDFS_PEB_TABLE_MAGIC			0x5074		/* Pt */
+#define SSDFS_LEB_TABLE_MAGIC			0x4C74		/* Lt */
+#define SSDFS_MAPTBL_CACHE_MAGIC		0x4D63		/* Mc */
+#define SSDFS_MAPTBL_CACHE_PEB_STATE_MAGIC	0x4D635053	/* McPS */
+#define SSDFS_INODES_BTREE_MAGIC		0x496E4274	/* InBt */
+#define SSDFS_INODES_BNODE_MAGIC		0x494E		/* IN */
+#define SSDFS_DENTRIES_BTREE_MAGIC		0x44654274	/* DeBt */
+#define SSDFS_DENTRIES_BNODE_MAGIC		0x444E		/* DN */
+#define SSDFS_EXTENTS_BTREE_MAGIC		0x45784274	/* ExBt */
+#define SSDFS_SHARED_EXTENTS_BTREE_MAGIC	0x53454274	/* SEBt */
+#define SSDFS_EXTENTS_BNODE_MAGIC		0x454E		/* EN */
+#define SSDFS_XATTR_BTREE_MAGIC			0x45414274	/* EABt */
+#define SSDFS_SHARED_XATTR_BTREE_MAGIC		0x53454174	/* SEAt */
+#define SSDFS_XATTR_BNODE_MAGIC			0x414E		/* AN */
+#define SSDFS_SHARED_DICT_BTREE_MAGIC		0x53446963	/* SDic */
+#define SSDFS_DICTIONARY_BNODE_MAGIC		0x534E		/* SN */
+#define SSDFS_SNAPSHOTS_BTREE_MAGIC		0x536E4274	/* SnBt */
+#define SSDFS_SNAPSHOTS_BNODE_MAGIC		0x736E		/* sn */
+#define SSDFS_SNAPSHOT_RULES_MAGIC		0x536E5275	/* SnRu */
+#define SSDFS_SNAPSHOT_RECORD_MAGIC		0x5372		/* Sr */
+#define SSDFS_PEB2TIME_RECORD_MAGIC		0x5072		/* Pr */
+#define SSDFS_DIFF_BLOB_MAGIC			0x4466		/* Df */
+#define SSDFS_INVEXT_BTREE_MAGIC		0x49784274	/* IxBt */
+#define SSDFS_INVEXT_BNODE_MAGIC		0x4958		/* IX */
+
+/* SSDFS revision */
+#define SSDFS_MAJOR_REVISION		1
+#define SSDFS_MINOR_REVISION		15
+
+/* SSDFS constants */
+#define SSDFS_MAX_NAME_LEN		255
+#define SSDFS_UUID_SIZE			16
+#define SSDFS_VOLUME_LABEL_MAX		16
+#define SSDFS_MAX_SNAP_RULE_NAME_LEN	16
+#define SSDFS_MAX_SNAPSHOT_NAME_LEN	12
+
+#define SSDFS_RESERVED_VBR_SIZE		1024 /* Volume Boot Record size*/
+#define SSDFS_DEFAULT_SEG_SIZE		8388608
+
+/*
+ * File system states
+ */
+#define SSDFS_MOUNTED_FS		0x0000  /* Mounted FS state */
+#define SSDFS_VALID_FS			0x0001  /* Unmounted cleanly */
+#define SSDFS_ERROR_FS			0x0002  /* Errors detected */
+#define SSDFS_RESIZE_FS			0x0004	/* Resize required */
+#define SSDFS_LAST_KNOWN_FS_STATE	SSDFS_RESIZE_FS
+
+/*
+ * Behaviour when detecting errors
+ */
+#define SSDFS_ERRORS_CONTINUE		1	/* Continue execution */
+#define SSDFS_ERRORS_RO			2	/* Remount fs read-only */
+#define SSDFS_ERRORS_PANIC		3	/* Panic */
+#define SSDFS_ERRORS_DEFAULT		SSDFS_ERRORS_CONTINUE
+#define SSDFS_LAST_KNOWN_FS_ERROR	SSDFS_ERRORS_PANIC
+
+/* Reserved inode id */
+#define SSDFS_INVALID_EXTENTS_BTREE_INO		5
+#define SSDFS_SNAPSHOTS_BTREE_INO		6
+#define SSDFS_TESTING_INO			7
+#define SSDFS_SHARED_DICT_BTREE_INO		8
+#define SSDFS_INODES_BTREE_INO			9
+#define SSDFS_SHARED_EXTENTS_BTREE_INO		10
+#define SSDFS_SHARED_XATTR_BTREE_INO		11
+#define SSDFS_MAPTBL_INO			12
+#define SSDFS_SEG_TREE_INO			13
+#define SSDFS_SEG_BMAP_INO			14
+#define SSDFS_PEB_CACHE_INO			15
+#define SSDFS_ROOT_INO				16
+
+#define SSDFS_LINK_MAX		INT_MAX
+
+#define SSDFS_CUR_SEG_DEFAULT_ID	3
+#define SSDFS_LOG_PAGES_DEFAULT		32
+#define SSDFS_CREATE_THREADS_DEFAULT	1
+
+#endif /* _UAPI_LINUX_SSDFS_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH 03/76] ssdfs: implement raw device operations
  2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 01/76] ssdfs: introduce SSDFS on-disk layout Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 02/76] ssdfs: key file system declarations Viacheslav Dubeyko
@ 2023-02-25  1:08 ` Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 04/76] ssdfs: implement super operations Viacheslav Dubeyko
                   ` (73 subsequent siblings)
  76 siblings, 0 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

Implement raw device operations:
(1) device_name: get device name
(2) device_size: get device size in bytes
(3) open_zone: open zone
(4) reopen_zone: reopen closed zone
(5) close_zone: close zone
(6) read: read from device
(7) readpage: read page
(8) readpages: read sequence of pages
(9) can_write_page: can we write into page?
(10) writepage: write page to device
(11) writepages: write sequence of pages to device
(12) erase: erase block
(13) trim: support of background erase operation
(14) sync: synchronize page cache with device

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
CC: Viacheslav Dubeyko <viacheslav.dubeyko@bytedance.com>
CC: Luka Perkov <luka.perkov@sartura.hr>
CC: Bruno Banelli <bruno.banelli@sartura.hr>
---
 fs/ssdfs/dev_bdev.c | 1187 +++++++++++++++++++++++++++++++++++++++
 fs/ssdfs/dev_mtd.c  |  641 ++++++++++++++++++++++
 fs/ssdfs/dev_zns.c  | 1281 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 3109 insertions(+)
 create mode 100644 fs/ssdfs/dev_bdev.c
 create mode 100644 fs/ssdfs/dev_mtd.c
 create mode 100644 fs/ssdfs/dev_zns.c

diff --git a/fs/ssdfs/dev_bdev.c b/fs/ssdfs/dev_bdev.c
new file mode 100644
index 000000000000..b6cfb7d79c8c
--- /dev/null
+++ b/fs/ssdfs/dev_bdev.c
@@ -0,0 +1,1187 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/dev_bdev.c - Block device access code.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/pagevec.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/backing-dev.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+
+#include <trace/events/ssdfs.h>
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_dev_bdev_page_leaks;
+atomic64_t ssdfs_dev_bdev_memory_leaks;
+atomic64_t ssdfs_dev_bdev_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_dev_bdev_cache_leaks_increment(void *kaddr)
+ * void ssdfs_dev_bdev_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_dev_bdev_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dev_bdev_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dev_bdev_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_dev_bdev_kfree(void *kaddr)
+ * struct page *ssdfs_dev_bdev_alloc_page(gfp_t gfp_mask)
+ * struct page *ssdfs_dev_bdev_add_pagevec_page(struct pagevec *pvec)
+ * void ssdfs_dev_bdev_free_page(struct page *page)
+ * void ssdfs_dev_bdev_pagevec_release(struct pagevec *pvec)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(dev_bdev)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(dev_bdev)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_dev_bdev_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_dev_bdev_page_leaks, 0);
+	atomic64_set(&ssdfs_dev_bdev_memory_leaks, 0);
+	atomic64_set(&ssdfs_dev_bdev_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_dev_bdev_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_dev_bdev_page_leaks) != 0) {
+		SSDFS_ERR("BLOCK DEV: "
+			  "memory leaks include %lld pages\n",
+			  atomic64_read(&ssdfs_dev_bdev_page_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dev_bdev_memory_leaks) != 0) {
+		SSDFS_ERR("BLOCK DEV: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dev_bdev_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dev_bdev_cache_leaks) != 0) {
+		SSDFS_ERR("BLOCK DEV: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dev_bdev_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static DECLARE_WAIT_QUEUE_HEAD(wq);
+
+/*
+ * ssdfs_bdev_device_name() - get device name
+ * @sb: superblock object
+ */
+static const char *ssdfs_bdev_device_name(struct super_block *sb)
+{
+	return sb->s_id;
+}
+
+/*
+ * ssdfs_bdev_device_size() - get partition size in bytes
+ * @sb: superblock object
+ */
+static __u64 ssdfs_bdev_device_size(struct super_block *sb)
+{
+	return i_size_read(sb->s_bdev->bd_inode);
+}
+
+static int ssdfs_bdev_open_zone(struct super_block *sb, loff_t offset)
+{
+	return -EOPNOTSUPP;
+}
+
+static int ssdfs_bdev_reopen_zone(struct super_block *sb, loff_t offset)
+{
+	return -EOPNOTSUPP;
+}
+
+static int ssdfs_bdev_close_zone(struct super_block *sb, loff_t offset)
+{
+	return -EOPNOTSUPP;
+}
+
+/*
+ * ssdfs_bdev_bio_alloc() - allocate bio object
+ * @bdev: block device
+ * @nr_iovecs: number of items in biovec
+ * @op: direction of I/O
+ * @gfp_mask: mask of creation flags
+ */
+struct bio *ssdfs_bdev_bio_alloc(struct block_device *bdev,
+				 unsigned int nr_iovecs,
+				 unsigned int op,
+				 gfp_t gfp_mask)
+{
+	struct bio *bio;
+
+	bio = bio_alloc(bdev, nr_iovecs, op, gfp_mask);
+	if (!bio) {
+		SSDFS_ERR("fail to allocate bio\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return bio;
+}
+
+/*
+ * ssdfs_bdev_bio_put() - free bio object
+ */
+void ssdfs_bdev_bio_put(struct bio *bio)
+{
+	if (!bio)
+		return;
+
+	bio_put(bio);
+}
+
+/*
+ * ssdfs_bdev_bio_add_page() - add page into bio
+ * @bio: pointer on bio object
+ * @page: memory page
+ * @len: size of data into memory page
+ * @offset: vec entry offset
+ */
+int ssdfs_bdev_bio_add_page(struct bio *bio, struct page *page,
+			    unsigned int len, unsigned int offset)
+{
+	int res;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bio || !page);
+
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	res = bio_add_page(bio, page, len, offset);
+	if (res != len) {
+		SSDFS_ERR("res %d != len %u\n",
+			  res, len);
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_bdev_sync_page_request() - submit page request
+ * @sb: superblock object
+ * @page: memory page
+ * @offset: offset in bytes from partition's begin
+ * @op: direction of I/O
+ * @op_flags: request op flags
+ */
+static int ssdfs_bdev_sync_page_request(struct super_block *sb,
+					struct page *page,
+					loff_t offset,
+					unsigned int op, int op_flags)
+{
+	struct bio *bio;
+	pgoff_t index = (pgoff_t)(offset >> PAGE_SHIFT);
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	bio = ssdfs_bdev_bio_alloc(sb->s_bdev, 1, op, GFP_NOIO);
+	if (IS_ERR_OR_NULL(bio)) {
+		err = !bio ? -ERANGE : PTR_ERR(bio);
+		SSDFS_ERR("fail to allocate bio: err %d\n",
+			  err);
+		return err;
+	}
+
+	bio->bi_iter.bi_sector = index * (PAGE_SIZE >> 9);
+	bio_set_dev(bio, sb->s_bdev);
+	bio->bi_opf = op | op_flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_bdev_bio_add_page(bio, page, PAGE_SIZE, 0);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to add page into bio: "
+			  "err %d\n",
+			  err);
+		goto finish_sync_page_request;
+	}
+
+	err = submit_bio_wait(bio);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to process request: "
+			  "err %d\n",
+			  err);
+		goto finish_sync_page_request;
+	}
+
+finish_sync_page_request:
+	ssdfs_bdev_bio_put(bio);
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_sync_pvec_request() - submit pagevec request
+ * @sb: superblock object
+ * @pvec: pagevec
+ * @offset: offset in bytes from partition's begin
+ * @op: direction of I/O
+ * @op_flags: request op flags
+ */
+static int ssdfs_bdev_sync_pvec_request(struct super_block *sb,
+					struct pagevec *pvec,
+					loff_t offset,
+					unsigned int op, int op_flags)
+{
+	struct bio *bio;
+	pgoff_t index = (pgoff_t)(offset >> PAGE_SHIFT);
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pvec);
+
+	SSDFS_DBG("offset %llu, op %#x, op_flags %#x\n",
+		  offset, op, op_flags);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (pagevec_count(pvec) == 0) {
+		SSDFS_WARN("empty page vector\n");
+		return 0;
+	}
+
+	bio = ssdfs_bdev_bio_alloc(sb->s_bdev, pagevec_count(pvec),
+				   op, GFP_NOIO);
+	if (IS_ERR_OR_NULL(bio)) {
+		err = !bio ? -ERANGE : PTR_ERR(bio);
+		SSDFS_ERR("fail to allocate bio: err %d\n",
+			  err);
+		return err;
+	}
+
+	bio->bi_iter.bi_sector = index * (PAGE_SIZE >> 9);
+	bio_set_dev(bio, sb->s_bdev);
+	bio->bi_opf = op | op_flags;
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		struct page *page = pvec->pages[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!page);
+
+		SSDFS_DBG("page %p, count %d\n",
+			  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_bdev_bio_add_page(bio, page,
+					      PAGE_SIZE,
+					      0);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to add page %d into bio: "
+				  "err %d\n",
+				  i, err);
+			goto finish_sync_pvec_request;
+		}
+	}
+
+	err = submit_bio_wait(bio);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to process request: "
+			  "err %d\n",
+			  err);
+		goto finish_sync_pvec_request;
+	}
+
+finish_sync_pvec_request:
+	ssdfs_bdev_bio_put(bio);
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_readpage() - read page from the volume
+ * @sb: superblock object
+ * @page: memory page
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to read data on @offset
+ * from partition's begin in memory page.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+int ssdfs_bdev_readpage(struct super_block *sb, struct page *page,
+			loff_t offset)
+{
+	int err;
+
+	err = ssdfs_bdev_sync_page_request(sb, page, offset,
+					   REQ_OP_READ, REQ_SYNC);
+	if (err) {
+		ClearPageUptodate(page);
+		ssdfs_clear_page_private(page, 0);
+		SetPageError(page);
+	} else {
+		SetPageUptodate(page);
+		ClearPageError(page);
+		flush_dcache_page(page);
+	}
+
+	ssdfs_unlock_page(page);
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_readpages() - read pages from the volume
+ * @sb: superblock object
+ * @pvec: pagevec
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to read data on @offset
+ * from partition's begin in memory page.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+int ssdfs_bdev_readpages(struct super_block *sb, struct pagevec *pvec,
+			 loff_t offset)
+{
+	int i;
+	int err = 0;
+
+	err = ssdfs_bdev_sync_pvec_request(sb, pvec, offset,
+					   REQ_OP_READ, REQ_RAHEAD);
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		struct page *page = pvec->pages[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (err) {
+			ClearPageUptodate(page);
+			ssdfs_clear_page_private(page, 0);
+			SetPageError(page);
+		} else {
+			SetPageUptodate(page);
+			ClearPageError(page);
+			flush_dcache_page(page);
+		}
+
+		ssdfs_unlock_page(page);
+	}
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_read_pvec() - read from volume into buffer
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size of buffer in bytes
+ * @buf: buffer
+ * @read_bytes: pointer on read bytes [out]
+ *
+ * This function tries to read data on @offset
+ * from partition's begin with @len bytes in size
+ * from the volume into @buf.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_bdev_read_pvec(struct super_block *sb,
+				loff_t offset, size_t len,
+				void *buf, size_t *read_bytes)
+{
+	struct pagevec pvec;
+	struct page *page;
+	loff_t page_start, page_end;
+	u32 pages_count;
+	u32 read_len;
+	loff_t cur_offset = offset;
+	u32 page_off;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, len %zu, buf %p\n",
+		  sb, (unsigned long long)offset, len, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*read_bytes = 0;
+
+	page_start = offset >> PAGE_SHIFT;
+	page_end = (offset + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	pages_count = (u32)(page_end - page_start);
+
+	if (pages_count > PAGEVEC_SIZE) {
+		SSDFS_ERR("pages_count %u > pvec_capacity %u\n",
+			  pages_count, PAGEVEC_SIZE);
+		return -ERANGE;
+	}
+
+	pagevec_init(&pvec);
+
+	for (i = 0; i < pages_count; i++) {
+		page = ssdfs_dev_bdev_alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (IS_ERR_OR_NULL(page)) {
+			err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+			SSDFS_ERR("unable to allocate memory page\n");
+			goto finish_bdev_read_pvec;
+		}
+
+		ssdfs_get_page(page);
+		ssdfs_lock_page(page);
+		pagevec_add(&pvec, page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page %p, count %d\n",
+			  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+	err = ssdfs_bdev_sync_pvec_request(sb, &pvec, offset,
+					   REQ_OP_READ, REQ_SYNC);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to read pagevec: err %d\n",
+			  err);
+		goto finish_bdev_read_pvec;
+	}
+
+	for (i = 0; i < pagevec_count(&pvec); i++) {
+		page = pvec.pages[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (*read_bytes >= len) {
+			err = -ERANGE;
+			SSDFS_ERR("read_bytes %zu >= len %zu\n",
+				  *read_bytes, len);
+			goto finish_bdev_read_pvec;
+		}
+
+		div_u64_rem(cur_offset, PAGE_SIZE, &page_off);
+		read_len = min_t(size_t, (size_t)(PAGE_SIZE - page_off),
+					  (size_t)(len - *read_bytes));
+
+		err = ssdfs_memcpy_from_page(buf, *read_bytes, len,
+					     page, page_off, PAGE_SIZE,
+					     read_len);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy: err %d\n", err);
+			goto finish_bdev_read_pvec;
+		}
+
+		*read_bytes += read_len;
+		cur_offset += read_len;
+	}
+
+finish_bdev_read_pvec:
+	for (i = pagevec_count(&pvec) - 1; i >= 0; i--) {
+		page = pvec.pages[i];
+
+		if (page) {
+			ssdfs_unlock_page(page);
+			ssdfs_put_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("page %p, count %d\n",
+				  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			ssdfs_dev_bdev_free_page(page);
+			pvec.pages[i] = NULL;
+		}
+	}
+
+	pagevec_reinit(&pvec);
+
+	if (*read_bytes != len) {
+		err = -EIO;
+		SSDFS_ERR("read_bytes (%zu) != len (%zu)\n",
+			  *read_bytes, len);
+	}
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_read() - read from volume into buffer
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size of buffer in bytes
+ * @buf: buffer
+ *
+ * This function tries to read data on @offset
+ * from partition's begin with @len bytes in size
+ * from the volume into @buf.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+int ssdfs_bdev_read(struct super_block *sb, loff_t offset,
+		    size_t len, void *buf)
+{
+	size_t read_bytes = 0;
+	loff_t cur_offset = offset;
+	u8 *ptr = (u8 *)buf;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, len %zu, buf %p\n",
+		  sb, (unsigned long long)offset, len, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (len == 0) {
+		SSDFS_WARN("len is zero\n");
+		return 0;
+	}
+
+	while (read_bytes < len) {
+		size_t iter_read;
+
+		err = ssdfs_bdev_read_pvec(sb, cur_offset,
+					   len - read_bytes,
+					   ptr,
+					   &iter_read);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to read pvec: "
+				  "cur_offset %llu, read_bytes %zu, "
+				  "err %d\n",
+				  cur_offset, read_bytes, err);
+			return err;
+		}
+
+		cur_offset += iter_read;
+		ptr += iter_read;
+		read_bytes += iter_read;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_bdev_can_write_page() - check that page can be written
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @need_check: make check or not?
+ *
+ * This function checks that page can be written.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-ENOMEM      - fail to allocate memory.
+ * %-EIO         - I/O error.
+ */
+int ssdfs_bdev_can_write_page(struct super_block *sb, loff_t offset,
+			      bool need_check)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	void *buf;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, need_check %d\n",
+		  sb, (unsigned long long)offset, (int)need_check);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!need_check)
+		return 0;
+
+	buf = ssdfs_dev_bdev_kzalloc(fsi->pagesize, GFP_KERNEL);
+	if (!buf) {
+		SSDFS_ERR("unable to allocate %d bytes\n", fsi->pagesize);
+		return -ENOMEM;
+	}
+
+	err = ssdfs_bdev_read(sb, offset, fsi->pagesize, buf);
+	if (err)
+		goto free_buf;
+
+	if (memchr_inv(buf, 0xff, fsi->pagesize)) {
+		if (memchr_inv(buf, 0x00, fsi->pagesize)) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("area with offset %llu contains data\n",
+				  (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+			err = -EIO;
+		}
+	}
+
+free_buf:
+	ssdfs_dev_bdev_kfree(buf);
+	return err;
+}
+
+/*
+ * ssdfs_bdev_writepage() - write memory page on volume
+ * @sb: superblock object
+ * @to_off: offset in bytes from partition's begin
+ * @page: memory page
+ * @from_off: offset in bytes from page's begin
+ * @len: size of data in bytes
+ *
+ * This function tries to write from @page data of @len size
+ * on @offset from partition's begin in memory page.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EIO         - I/O error.
+ */
+int ssdfs_bdev_writepage(struct super_block *sb, loff_t to_off,
+			 struct page *page, u32 from_off, size_t len)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+#ifdef CONFIG_SSDFS_DEBUG
+	u32 remainder;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, to_off %llu, page %p, from_off %u, len %zu\n",
+		  sb, to_off, page, from_off, len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY) {
+		SSDFS_WARN("unable to write on RO file system\n");
+		return -EROFS;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!page);
+	BUG_ON((to_off >= ssdfs_bdev_device_size(sb)) ||
+		(len > (ssdfs_bdev_device_size(sb) - to_off)));
+	BUG_ON(len == 0);
+	div_u64_rem((u64)to_off, (u64)fsi->pagesize, &remainder);
+	BUG_ON(remainder);
+	BUG_ON((from_off + len) > PAGE_SIZE);
+	BUG_ON(!PageDirty(page));
+	BUG_ON(PageLocked(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_lock_page(page);
+	atomic_inc(&fsi->pending_bios);
+
+	err = ssdfs_bdev_sync_page_request(sb, page, to_off,
+					   REQ_OP_WRITE, REQ_SYNC);
+	if (err) {
+		SetPageError(page);
+		SSDFS_ERR("failed to write (err %d): offset %llu\n",
+			  err, (unsigned long long)to_off);
+	} else {
+		ssdfs_clear_dirty_page(page);
+		SetPageUptodate(page);
+		ClearPageError(page);
+	}
+
+	ssdfs_unlock_page(page);
+	ssdfs_put_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (atomic_dec_and_test(&fsi->pending_bios))
+		wake_up_all(&wq);
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_writepages() - write pagevec on volume
+ * @sb: superblock object
+ * @to_off: offset in bytes from partition's begin
+ * @pvec: memory pages vector
+ * @from_off: offset in bytes from page's begin
+ * @len: size of data in bytes
+ *
+ * This function tries to write from @pvec data of @len size
+ * on @offset from partition's begin.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EIO         - I/O error.
+ */
+int ssdfs_bdev_writepages(struct super_block *sb, loff_t to_off,
+			  struct pagevec *pvec,
+			  u32 from_off, size_t len)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct page *page;
+	int i;
+#ifdef CONFIG_SSDFS_DEBUG
+	u32 remainder;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, to_off %llu, pvec %p, from_off %u, len %zu\n",
+		  sb, to_off, pvec, from_off, len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY) {
+		SSDFS_WARN("unable to write on RO file system\n");
+		return -EROFS;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pvec);
+	BUG_ON((to_off >= ssdfs_bdev_device_size(sb)) ||
+		(len > (ssdfs_bdev_device_size(sb) - to_off)));
+	BUG_ON(len == 0);
+	div_u64_rem((u64)to_off, (u64)fsi->pagesize, &remainder);
+	BUG_ON(remainder);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (pagevec_count(pvec) == 0) {
+		SSDFS_WARN("empty pagevec\n");
+		return 0;
+	}
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		page = pvec->pages[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!page);
+		BUG_ON(!PageDirty(page));
+		BUG_ON(PageLocked(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_lock_page(page);
+	}
+
+	atomic_inc(&fsi->pending_bios);
+
+	err = ssdfs_bdev_sync_pvec_request(sb, pvec, to_off,
+					   REQ_OP_WRITE, REQ_SYNC);
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		page = pvec->pages[i];
+
+		if (err) {
+			SetPageError(page);
+			SSDFS_ERR("failed to write (err %d): "
+				  "page_index %llu\n",
+				  err,
+				  (unsigned long long)page_index(page));
+		} else {
+			ssdfs_clear_dirty_page(page);
+			SetPageUptodate(page);
+			ClearPageError(page);
+		}
+
+		ssdfs_unlock_page(page);
+		ssdfs_put_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page %p, count %d\n",
+			  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+	if (atomic_dec_and_test(&fsi->pending_bios))
+		wake_up_all(&wq);
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_erase_end_io() - callback for erase operation end
+ */
+static void ssdfs_bdev_erase_end_io(struct bio *bio)
+{
+	struct super_block *sb = bio->bi_private;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+
+	BUG_ON(bio->bi_vcnt == 0);
+
+	ssdfs_bdev_bio_put(bio);
+	if (atomic_dec_and_test(&fsi->pending_bios))
+		wake_up_all(&wq);
+}
+
+/*
+ * ssdfs_bdev_support_discard() - check that block device supports discard
+ */
+static inline bool ssdfs_bdev_support_discard(struct block_device *bdev)
+{
+	return bdev_max_discard_sectors(bdev) ||
+		bdev_is_zoned(bdev);
+}
+
+/*
+ * ssdfs_bdev_erase_request() - initiate erase request
+ * @sb: superblock object
+ * @nr_iovecs: number of pages for erase
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to make erase operation.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EFAULT      - erase operation error.
+ */
+static int ssdfs_bdev_erase_request(struct super_block *sb,
+				    unsigned int nr_iovecs,
+				    loff_t offset)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct page *erase_page = fsi->erase_page;
+	struct bio *bio;
+	unsigned int max_pages;
+	pgoff_t index = (pgoff_t)(offset >> PAGE_SHIFT);
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!erase_page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (nr_iovecs == 0) {
+		SSDFS_WARN("empty vector\n");
+		return 0;
+	}
+
+	max_pages = min_t(unsigned int, nr_iovecs, BIO_MAX_VECS);
+
+	bio = ssdfs_bdev_bio_alloc(sb->s_bdev, max_pages,
+				   REQ_OP_DISCARD, GFP_NOFS);
+	if (IS_ERR_OR_NULL(bio)) {
+		err = !bio ? -ERANGE : PTR_ERR(bio);
+		SSDFS_ERR("fail to allocate bio: err %d\n",
+			  err);
+		return err;
+	}
+
+	for (i = 0; i < nr_iovecs; i++) {
+		if (i >= max_pages) {
+			bio_set_dev(bio, sb->s_bdev);
+			bio->bi_opf = REQ_OP_DISCARD | REQ_BACKGROUND;
+			bio->bi_iter.bi_sector = index * (PAGE_SIZE >> 9);
+			bio->bi_private = sb;
+			bio->bi_end_io = ssdfs_bdev_erase_end_io;
+			atomic_inc(&fsi->pending_bios);
+			submit_bio(bio);
+
+			index += i;
+			nr_iovecs -= i;
+			i = 0;
+
+			bio = ssdfs_bdev_bio_alloc(sb->s_bdev, max_pages,
+						   REQ_OP_DISCARD, GFP_NOFS);
+			if (IS_ERR_OR_NULL(bio)) {
+				err = !bio ? -ERANGE : PTR_ERR(bio);
+				SSDFS_ERR("fail to allocate bio: err %d\n",
+					  err);
+				return err;
+			}
+		}
+
+		err = ssdfs_bdev_bio_add_page(bio, erase_page,
+					      PAGE_SIZE,
+					      0);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to add page %d into bio: "
+				  "err %d\n",
+				  i, err);
+			goto finish_erase_request;
+		}
+	}
+
+	bio_set_dev(bio, sb->s_bdev);
+	bio->bi_opf = REQ_OP_DISCARD | REQ_BACKGROUND;
+	bio->bi_iter.bi_sector = index * (PAGE_SIZE >> 9);
+	bio->bi_private = sb;
+	bio->bi_end_io = ssdfs_bdev_erase_end_io;
+	atomic_inc(&fsi->pending_bios);
+	submit_bio(bio);
+
+	return 0;
+
+finish_erase_request:
+	ssdfs_bdev_bio_put(bio);
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_erase() - make erase operation
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size in bytes
+ *
+ * This function tries to make erase operation.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EFAULT      - erase operation error.
+ */
+static int ssdfs_bdev_erase(struct super_block *sb, loff_t offset, size_t len)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	u32 erase_size = fsi->erasesize;
+	loff_t page_start, page_end;
+	u32 pages_count;
+	sector_t start_sector;
+	sector_t sectors_count;
+	u32 remainder;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, len %zu\n",
+		  sb, (unsigned long long)offset, len);
+
+	div_u64_rem((u64)len, (u64)erase_size, &remainder);
+	BUG_ON(remainder);
+	div_u64_rem((u64)offset, (u64)erase_size, &remainder);
+	BUG_ON(remainder);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY)
+		return -EROFS;
+
+	div_u64_rem((u64)len, (u64)erase_size, &remainder);
+	if (remainder) {
+		SSDFS_WARN("len %llu, erase_size %u, remainder %u\n",
+			   (unsigned long long)len,
+			   erase_size, remainder);
+		return -ERANGE;
+	}
+
+	page_start = offset >> PAGE_SHIFT;
+	page_end = (offset + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	pages_count = (u32)(page_end - page_start);
+
+	if (pages_count == 0) {
+		SSDFS_WARN("pages_count equals to zero\n");
+		return -ERANGE;
+	}
+
+	if (ssdfs_bdev_support_discard(sb->s_bdev)) {
+		err = ssdfs_bdev_erase_request(sb, pages_count, offset);
+		if (unlikely(err))
+			goto try_zeroout;
+	} else {
+try_zeroout:
+		start_sector = page_start <<
+					(PAGE_SHIFT - SSDFS_SECTOR_SHIFT);
+		sectors_count = pages_count <<
+					(PAGE_SHIFT - SSDFS_SECTOR_SHIFT);
+
+		err = blkdev_issue_zeroout(sb->s_bdev,
+					   start_sector, sectors_count,
+					   GFP_NOFS, 0);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to erase: "
+			  "offset %llu, len %zu, err %d\n",
+			  (unsigned long long)offset,
+			  len, err);
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_bdev_trim() - initiate background erase operation
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size in bytes
+ *
+ * This function tries to initiate background erase operation.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EFAULT      - erase operation error.
+ */
+static int ssdfs_bdev_trim(struct super_block *sb, loff_t offset, size_t len)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	u32 erase_size = fsi->erasesize;
+	loff_t page_start, page_end;
+	u32 pages_count;
+	u32 remainder;
+	sector_t start_sector;
+	sector_t sectors_count;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, len %zu\n",
+		  sb, (unsigned long long)offset, len);
+
+	div_u64_rem((u64)len, (u64)erase_size, &remainder);
+	BUG_ON(remainder);
+	div_u64_rem((u64)offset, (u64)erase_size, &remainder);
+	BUG_ON(remainder);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY)
+		return -EROFS;
+
+	div_u64_rem((u64)len, (u64)erase_size, &remainder);
+	if (remainder) {
+		SSDFS_WARN("len %llu, erase_size %u, remainder %u\n",
+			   (unsigned long long)len,
+			   erase_size, remainder);
+		return -ERANGE;
+	}
+
+	page_start = offset >> PAGE_SHIFT;
+	page_end = (offset + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	pages_count = (u32)(page_end - page_start);
+
+	if (pages_count == 0) {
+		SSDFS_WARN("pages_count equals to zero\n");
+		return -ERANGE;
+	}
+
+	start_sector = page_start << (PAGE_SHIFT - SSDFS_SECTOR_SHIFT);
+	sectors_count = pages_count << (PAGE_SHIFT - SSDFS_SECTOR_SHIFT);
+
+	if (ssdfs_bdev_support_discard(sb->s_bdev)) {
+		err = blkdev_issue_discard(sb->s_bdev,
+					   start_sector, sectors_count,
+					   GFP_NOFS);
+		if (unlikely(err))
+			goto try_zeroout;
+	} else {
+try_zeroout:
+		err = blkdev_issue_zeroout(sb->s_bdev,
+					   start_sector, sectors_count,
+					   GFP_NOFS, 0);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to discard: "
+			  "start_sector %llu, sectors_count %llu, "
+			  "err %d\n",
+			  start_sector, sectors_count, err);
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_bdev_peb_isbad() - check that PEB is bad
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to detect that PEB is bad or not.
+ */
+static int ssdfs_bdev_peb_isbad(struct super_block *sb, loff_t offset)
+{
+	/* do nothing */
+	return 0;
+}
+
+/*
+ * ssdfs_bdev_mark_peb_bad() - mark PEB as bad
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to mark PEB as bad.
+ */
+int ssdfs_bdev_mark_peb_bad(struct super_block *sb, loff_t offset)
+{
+	/* do nothing */
+	return 0;
+}
+
+/*
+ * ssdfs_bdev_sync() - make sync operation
+ * @sb: superblock object
+ */
+static void ssdfs_bdev_sync(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("device %s\n", sb->s_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	wait_event(wq, atomic_read(&fsi->pending_bios) == 0);
+}
+
+const struct ssdfs_device_ops ssdfs_bdev_devops = {
+	.device_name		= ssdfs_bdev_device_name,
+	.device_size		= ssdfs_bdev_device_size,
+	.open_zone		= ssdfs_bdev_open_zone,
+	.reopen_zone		= ssdfs_bdev_reopen_zone,
+	.close_zone		= ssdfs_bdev_close_zone,
+	.read			= ssdfs_bdev_read,
+	.readpage		= ssdfs_bdev_readpage,
+	.readpages		= ssdfs_bdev_readpages,
+	.can_write_page		= ssdfs_bdev_can_write_page,
+	.writepage		= ssdfs_bdev_writepage,
+	.writepages		= ssdfs_bdev_writepages,
+	.erase			= ssdfs_bdev_erase,
+	.trim			= ssdfs_bdev_trim,
+	.peb_isbad		= ssdfs_bdev_peb_isbad,
+	.mark_peb_bad		= ssdfs_bdev_mark_peb_bad,
+	.sync			= ssdfs_bdev_sync,
+};
diff --git a/fs/ssdfs/dev_mtd.c b/fs/ssdfs/dev_mtd.c
new file mode 100644
index 000000000000..6c092ea863bd
--- /dev/null
+++ b/fs/ssdfs/dev_mtd.c
@@ -0,0 +1,641 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/dev_mtd.c - MTD device access code.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/mtd/mtd.h>
+#include <linux/mtd/super.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+
+#include <trace/events/ssdfs.h>
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_dev_mtd_page_leaks;
+atomic64_t ssdfs_dev_mtd_memory_leaks;
+atomic64_t ssdfs_dev_mtd_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_dev_mtd_cache_leaks_increment(void *kaddr)
+ * void ssdfs_dev_mtd_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_dev_mtd_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dev_mtd_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dev_mtd_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_dev_mtd_kfree(void *kaddr)
+ * struct page *ssdfs_dev_mtd_alloc_page(gfp_t gfp_mask)
+ * struct page *ssdfs_dev_mtd_add_pagevec_page(struct pagevec *pvec)
+ * void ssdfs_dev_mtd_free_page(struct page *page)
+ * void ssdfs_dev_mtd_pagevec_release(struct pagevec *pvec)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(dev_mtd)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(dev_mtd)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_dev_mtd_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_dev_mtd_page_leaks, 0);
+	atomic64_set(&ssdfs_dev_mtd_memory_leaks, 0);
+	atomic64_set(&ssdfs_dev_mtd_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_dev_mtd_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_dev_mtd_page_leaks) != 0) {
+		SSDFS_ERR("MTD DEV: "
+			  "memory leaks include %lld pages\n",
+			  atomic64_read(&ssdfs_dev_mtd_page_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dev_mtd_memory_leaks) != 0) {
+		SSDFS_ERR("MTD DEV: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dev_mtd_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dev_mtd_cache_leaks) != 0) {
+		SSDFS_ERR("MTD DEV: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dev_mtd_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+/*
+ * ssdfs_mtd_device_name() - get device name
+ * @sb: superblock object
+ */
+static const char *ssdfs_mtd_device_name(struct super_block *sb)
+{
+	return sb->s_mtd->name;
+}
+
+/*
+ * ssdfs_mtd_device_size() - get partition size in bytes
+ * @sb: superblock object
+ */
+static __u64 ssdfs_mtd_device_size(struct super_block *sb)
+{
+	return SSDFS_FS_I(sb)->mtd->size;
+}
+
+static int ssdfs_mtd_open_zone(struct super_block *sb, loff_t offset)
+{
+	return -EOPNOTSUPP;
+}
+
+static int ssdfs_mtd_reopen_zone(struct super_block *sb, loff_t offset)
+{
+	return -EOPNOTSUPP;
+}
+
+static int ssdfs_mtd_close_zone(struct super_block *sb, loff_t offset)
+{
+	return -EOPNOTSUPP;
+}
+
+/*
+ * ssdfs_mtd_read() - read from volume into buffer
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size of buffer in bytes
+ * @buf: buffer
+ *
+ * This function tries to read data on @offset
+ * from partition's begin with @len bytes in size
+ * from the volume into @buf.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_mtd_read(struct super_block *sb, loff_t offset, size_t len,
+			  void *buf)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct mtd_info *mtd = fsi->mtd;
+	size_t retlen;
+	int ret;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, len %zu, buf %p\n",
+		  sb, (unsigned long long)offset, len, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ret = mtd_read(mtd, offset, len, &retlen, buf);
+	if (ret) {
+		SSDFS_ERR("failed to read (err %d): offset %llu, len %zu\n",
+			  ret, (unsigned long long)offset, len);
+		return ret;
+	}
+
+	if (retlen != len) {
+		SSDFS_ERR("retlen (%zu) != len (%zu)\n", retlen, len);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_mtd_readpage() - read page from the volume
+ * @sb: superblock object
+ * @page: memory page
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to read data on @offset
+ * from partition's begin in memory page.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_mtd_readpage(struct super_block *sb, struct page *page,
+				loff_t offset)
+{
+	void *kaddr;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, page %p, page_index %llu\n",
+		  sb, (unsigned long long)offset, page,
+		  (unsigned long long)page_index(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	kaddr = kmap_local_page(page);
+	err = ssdfs_mtd_read(sb, offset, PAGE_SIZE, kaddr);
+	flush_dcache_page(page);
+	kunmap_local(kaddr);
+
+	if (err) {
+		ClearPageUptodate(page);
+		ssdfs_clear_page_private(page, 0);
+		SetPageError(page);
+	} else {
+		SetPageUptodate(page);
+		ClearPageError(page);
+		flush_dcache_page(page);
+	}
+
+	ssdfs_unlock_page(page);
+
+	return err;
+}
+
+/*
+ * ssdfs_mtd_readpages() - read pages from the volume
+ * @sb: superblock object
+ * @pvec: vector of memory pages
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to read data on @offset
+ * from partition's begin in memory pages.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_mtd_readpages(struct super_block *sb, struct pagevec *pvec,
+				loff_t offset)
+{
+	struct page *page;
+	loff_t cur_offset = offset;
+	u32 page_off;
+	u32 read_bytes = 0;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, pvec %p\n",
+		  sb, (unsigned long long)offset, pvec);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (pagevec_count(pvec) == 0) {
+		SSDFS_WARN("empty page vector\n");
+		return 0;
+	}
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		page = pvec->pages[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_mtd_readpage(sb, page, cur_offset);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to read page: "
+				  "cur_offset %llu, err %d\n",
+				  cur_offset, err);
+			return err;
+		}
+
+		div_u64_rem(cur_offset, PAGE_SIZE, &page_off);
+		read_bytes = PAGE_SIZE - page_off;
+		cur_offset += read_bytes;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_mtd_can_write_page() - check that page can be written
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @need_check: make check or not?
+ *
+ * This function checks that page can be written.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-ENOMEM      - fail to allocate memory.
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_mtd_can_write_page(struct super_block *sb, loff_t offset,
+				    bool need_check)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	void *buf;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, need_check %d\n",
+		  sb, (unsigned long long)offset, (int)need_check);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!need_check)
+		return 0;
+
+	buf = ssdfs_dev_mtd_kzalloc(fsi->pagesize, GFP_KERNEL);
+	if (!buf) {
+		SSDFS_ERR("unable to allocate %d bytes\n", fsi->pagesize);
+		return -ENOMEM;
+	}
+
+	err = ssdfs_mtd_read(sb, offset, fsi->pagesize, buf);
+	if (err)
+		goto free_buf;
+
+	if (memchr_inv(buf, 0xff, fsi->pagesize)) {
+		SSDFS_ERR("area with offset %llu contains unmatching char\n",
+			  (unsigned long long)offset);
+		err = -EIO;
+	}
+
+free_buf:
+	ssdfs_dev_mtd_kfree(buf);
+	return err;
+}
+
+/*
+ * ssdfs_mtd_writepage() - write memory page on volume
+ * @sb: superblock object
+ * @to_off: offset in bytes from partition's begin
+ * @page: memory page
+ * @from_off: offset in bytes from page's begin
+ * @len: size of data in bytes
+ *
+ * This function tries to write from @page data of @len size
+ * on @offset from partition's begin in memory page.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_mtd_writepage(struct super_block *sb, loff_t to_off,
+				struct page *page, u32 from_off, size_t len)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct mtd_info *mtd = fsi->mtd;
+	size_t retlen;
+	unsigned char *kaddr;
+	int ret;
+#ifdef CONFIG_SSDFS_DEBUG
+	u32 remainder;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, to_off %llu, page %p, from_off %u, len %zu\n",
+		  sb, to_off, page, from_off, len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY) {
+		SSDFS_WARN("unable to write on RO file system\n");
+		return -EROFS;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!page);
+	BUG_ON((to_off >= mtd->size) || (len > (mtd->size - to_off)));
+	BUG_ON(len == 0);
+	div_u64_rem((u64)to_off, (u64)fsi->pagesize, &remainder);
+	BUG_ON(remainder);
+	BUG_ON((from_off + len) > PAGE_SIZE);
+	BUG_ON(!PageDirty(page));
+	BUG_ON(PageLocked(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_lock_page(page);
+	kaddr = kmap_local_page(page);
+	ret = mtd_write(mtd, to_off, len, &retlen, kaddr + from_off);
+	kunmap_local(kaddr);
+
+	if (ret || (retlen != len)) {
+		SetPageError(page);
+		SSDFS_ERR("failed to write (err %d): offset %llu, "
+			  "len %zu, retlen %zu\n",
+			  ret, (unsigned long long)to_off, len, retlen);
+		err = -EIO;
+	} else {
+		ssdfs_clear_dirty_page(page);
+		SetPageUptodate(page);
+		ClearPageError(page);
+	}
+
+	ssdfs_unlock_page(page);
+	ssdfs_put_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_mtd_writepages() - write memory pages on volume
+ * @sb: superblock object
+ * @to_off: offset in bytes from partition's begin
+ * @pvec: vector of memory pages
+ * @from_off: offset in bytes from page's begin
+ * @len: size of data in bytes
+ *
+ * This function tries to write from @pvec data of @len size
+ * on @offset from partition's begin in memory page.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_mtd_writepages(struct super_block *sb, loff_t to_off,
+				struct pagevec *pvec, u32 from_off, size_t len)
+{
+	struct page *page;
+	loff_t cur_to_off = to_off;
+	u32 page_off = from_off;
+	u32 written_bytes = 0;
+	size_t write_len;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, to_off %llu, pvec %p, from_off %u, len %zu\n",
+		  sb, to_off, pvec, from_off, len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY) {
+		SSDFS_WARN("unable to write on RO file system\n");
+		return -EROFS;
+	}
+
+	if (pagevec_count(pvec) == 0) {
+		SSDFS_WARN("empty page vector\n");
+		return 0;
+	}
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		page = pvec->pages[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (written_bytes >= len) {
+			SSDFS_ERR("written_bytes %u >= len %zu\n",
+				  written_bytes, len);
+			return -ERANGE;
+		}
+
+		write_len = min_t(size_t, (size_t)(PAGE_SIZE - page_off),
+					  (size_t)(len - written_bytes));
+
+		err = ssdfs_mtd_writepage(sb, cur_to_off, page, page_off, write_len);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to write page: "
+				  "cur_to_off %llu, page_off %u, "
+				  "write_len %zu, err %d\n",
+				  cur_to_off, page_off, write_len, err);
+			return err;
+		}
+
+		div_u64_rem(cur_to_off, PAGE_SIZE, &page_off);
+		written_bytes += write_len;
+		cur_to_off += write_len;
+	}
+
+	return 0;
+}
+
+static void ssdfs_erase_callback(struct erase_info *ei)
+{
+	complete((struct completion *)ei->priv);
+}
+
+/*
+ * ssdfs_mtd_erase() - make erase operation
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size in bytes
+ *
+ * This function tries to make erase operation.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EFAULT      - erase operation error.
+ */
+static int ssdfs_mtd_erase(struct super_block *sb, loff_t offset, size_t len)
+{
+	struct mtd_info *mtd = SSDFS_FS_I(sb)->mtd;
+	struct erase_info ei;
+	DECLARE_COMPLETION_ONSTACK(complete);
+	u32 remainder;
+	int ret;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, len %zu\n",
+		  sb, (unsigned long long)offset, len);
+
+	div_u64_rem((u64)len, (u64)mtd->erasesize, &remainder);
+	BUG_ON(remainder);
+	div_u64_rem((u64)offset, (u64)mtd->erasesize, &remainder);
+	BUG_ON(remainder);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY)
+		return -EROFS;
+
+	div_u64_rem((u64)len, (u64)mtd->erasesize, &remainder);
+	if (remainder) {
+		SSDFS_WARN("len %llu, erase_size %u, remainder %u\n",
+			   (unsigned long long)len,
+			   mtd->erasesize, remainder);
+		return -ERANGE;
+	}
+
+	memset(&ei, 0, sizeof(ei));
+	ei.mtd = mtd;
+	ei.addr = offset;
+	ei.len = len;
+	ei.callback = ssdfs_erase_callback;
+	ei.priv = (long)&complete;
+
+	ret = mtd_erase(mtd, &ei);
+	if (ret) {
+		SSDFS_ERR("failed to erase (err %d): offset %llu, len %zu\n",
+			  ret, (unsigned long long)offset, len);
+		return ret;
+	}
+
+	err = SSDFS_WAIT_COMPLETION(&complete);
+	if (unlikely(err)) {
+		SSDFS_ERR("timeout is out: "
+			  "err %d\n", err);
+		return err;
+	}
+
+	if (ei.state != MTD_ERASE_DONE) {
+		SSDFS_ERR("ei.state %#x, offset %llu, len %zu\n",
+			  ei.state, (unsigned long long)offset, len);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_mtd_trim() - initiate background erase operation
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size in bytes
+ *
+ * This function tries to initiate background erase operation.
+ * Currently, it is the same operation as foreground erase.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EFAULT      - erase operation error.
+ */
+static int ssdfs_mtd_trim(struct super_block *sb, loff_t offset, size_t len)
+{
+	return ssdfs_mtd_erase(sb, offset, len);
+}
+
+/*
+ * ssdfs_mtd_peb_isbad() - check that PEB is bad
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to detect that PEB is bad or not.
+ */
+static int ssdfs_mtd_peb_isbad(struct super_block *sb, loff_t offset)
+{
+	return mtd_block_isbad(SSDFS_FS_I(sb)->mtd, offset);
+}
+
+/*
+ * ssdfs_mtd_mark_peb_bad() - mark PEB as bad
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to mark PEB as bad.
+ */
+int ssdfs_mtd_mark_peb_bad(struct super_block *sb, loff_t offset)
+{
+	return mtd_block_markbad(SSDFS_FS_I(sb)->mtd, offset);
+}
+
+/*
+ * ssdfs_mtd_sync() - make sync operation
+ * @sb: superblock object
+ */
+static void ssdfs_mtd_sync(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("device %d (\"%s\")\n",
+		  fsi->mtd->index, fsi->mtd->name);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	mtd_sync(fsi->mtd);
+}
+
+const struct ssdfs_device_ops ssdfs_mtd_devops = {
+	.device_name		= ssdfs_mtd_device_name,
+	.device_size		= ssdfs_mtd_device_size,
+	.open_zone		= ssdfs_mtd_open_zone,
+	.reopen_zone		= ssdfs_mtd_reopen_zone,
+	.close_zone		= ssdfs_mtd_close_zone,
+	.read			= ssdfs_mtd_read,
+	.readpage		= ssdfs_mtd_readpage,
+	.readpages		= ssdfs_mtd_readpages,
+	.can_write_page		= ssdfs_mtd_can_write_page,
+	.writepage		= ssdfs_mtd_writepage,
+	.writepages		= ssdfs_mtd_writepages,
+	.erase			= ssdfs_mtd_erase,
+	.trim			= ssdfs_mtd_trim,
+	.peb_isbad		= ssdfs_mtd_peb_isbad,
+	.mark_peb_bad		= ssdfs_mtd_mark_peb_bad,
+	.sync			= ssdfs_mtd_sync,
+};
diff --git a/fs/ssdfs/dev_zns.c b/fs/ssdfs/dev_zns.c
new file mode 100644
index 000000000000..2b45f3b1632c
--- /dev/null
+++ b/fs/ssdfs/dev_zns.c
@@ -0,0 +1,1281 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/dev_zns.c - ZNS SSD support.
+ *
+ * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates.
+ *              https://www.bytedance.com/
+ * Copyright (c) 2022-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cong Wang
+ */
+
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/pagevec.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/backing-dev.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+
+#include <trace/events/ssdfs.h>
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_dev_zns_page_leaks;
+atomic64_t ssdfs_dev_zns_memory_leaks;
+atomic64_t ssdfs_dev_zns_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_dev_zns_cache_leaks_increment(void *kaddr)
+ * void ssdfs_dev_zns_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_dev_zns_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dev_zns_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dev_zns_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_dev_zns_kfree(void *kaddr)
+ * struct page *ssdfs_dev_zns_alloc_page(gfp_t gfp_mask)
+ * struct page *ssdfs_dev_zns_add_pagevec_page(struct pagevec *pvec)
+ * void ssdfs_dev_zns_free_page(struct page *page)
+ * void ssdfs_dev_zns_pagevec_release(struct pagevec *pvec)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(dev_zns)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(dev_zns)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_dev_zns_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_dev_zns_page_leaks, 0);
+	atomic64_set(&ssdfs_dev_zns_memory_leaks, 0);
+	atomic64_set(&ssdfs_dev_zns_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_dev_zns_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_dev_zns_page_leaks) != 0) {
+		SSDFS_ERR("ZNS DEV: "
+			  "memory leaks include %lld pages\n",
+			  atomic64_read(&ssdfs_dev_zns_page_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dev_zns_memory_leaks) != 0) {
+		SSDFS_ERR("ZNS DEV: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dev_zns_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dev_zns_cache_leaks) != 0) {
+		SSDFS_ERR("ZNS DEV: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dev_zns_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static DECLARE_WAIT_QUEUE_HEAD(zns_wq);
+
+/*
+ * ssdfs_zns_device_name() - get device name
+ * @sb: superblock object
+ */
+static const char *ssdfs_zns_device_name(struct super_block *sb)
+{
+	return sb->s_id;
+}
+
+/*
+ * ssdfs_zns_device_size() - get partition size in bytes
+ * @sb: superblock object
+ */
+static __u64 ssdfs_zns_device_size(struct super_block *sb)
+{
+	return i_size_read(sb->s_bdev->bd_inode);
+}
+
+static int ssdfs_report_zone(struct blk_zone *zone,
+			     unsigned int index, void *data)
+{
+	ssdfs_memcpy(data, 0, sizeof(struct blk_zone),
+		     zone, 0, sizeof(struct blk_zone),
+		     sizeof(struct blk_zone));
+	return 0;
+}
+
+/*
+ * ssdfs_zns_open_zone() - open zone
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ */
+static int ssdfs_zns_open_zone(struct super_block *sb, loff_t offset)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	sector_t zone_size = fsi->erasesize >> SECTOR_SHIFT;
+	u32 open_zones;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+	SSDFS_DBG("BEFORE: open_zones %d\n",
+		  atomic_read(&fsi->open_zones));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	open_zones = atomic_inc_return(&fsi->open_zones);
+	if (open_zones > fsi->max_open_zones) {
+		atomic_dec(&fsi->open_zones);
+
+		SSDFS_WARN("open zones limit achieved: "
+			   "open_zones %u\n", open_zones);
+		return -EBUSY;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("AFTER: open_zones %d\n",
+		   atomic_read(&fsi->open_zones));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = blkdev_zone_mgmt(sb->s_bdev, REQ_OP_ZONE_OPEN,
+				zone_sector, zone_size, GFP_NOFS);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to open zone: "
+			  "zone_sector %llu, zone_size %llu, "
+			  "open_zones %u, max_open_zones %u, "
+			  "err %d\n",
+			  zone_sector, zone_size,
+			  open_zones, fsi->max_open_zones,
+			  err);
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_zns_reopen_zone() - reopen closed zone
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ */
+static int ssdfs_zns_reopen_zone(struct super_block *sb, loff_t offset)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	sector_t zone_size = fsi->erasesize >> SECTOR_SHIFT;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (err != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, err);
+		return err;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("zone before: start %llu, len %llu, wp %llu, "
+		  "type %#x, cond %#x, non_seq %#x, "
+		  "reset %#x, capacity %llu\n",
+		  zone.start, zone.len, zone.wp,
+		  zone.type, zone.cond, zone.non_seq,
+		  zone.reset, zone.capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (zone.cond) {
+	case BLK_ZONE_COND_CLOSED:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is closed: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		/* continue logic */
+		break;
+
+	case BLK_ZONE_COND_READONLY:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is READ-ONLY: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_FULL:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is full: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_OFFLINE:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is offline: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	default:
+		/* continue logic */
+		break;
+	}
+
+	err = blkdev_zone_mgmt(sb->s_bdev, REQ_OP_ZONE_OPEN,
+				zone_sector, zone_size, GFP_NOFS);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to open zone: "
+			  "zone_sector %llu, zone_size %llu, "
+			  "err %d\n",
+			  zone_sector, zone_size,
+			  err);
+		return err;
+	}
+
+	err = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (err != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, err);
+		return err;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("zone after: start %llu, len %llu, wp %llu, "
+		  "type %#x, cond %#x, non_seq %#x, "
+		  "reset %#x, capacity %llu\n",
+		  zone.start, zone.len, zone.wp,
+		  zone.type, zone.cond, zone.non_seq,
+		  zone.reset, zone.capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (zone.cond) {
+	case BLK_ZONE_COND_CLOSED:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is closed: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_READONLY:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is READ-ONLY: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_FULL:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is full: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_OFFLINE:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is offline: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	default:
+		/* continue logic */
+		break;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_zns_close_zone() - close zone
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ */
+static int ssdfs_zns_close_zone(struct super_block *sb, loff_t offset)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	sector_t zone_size = fsi->erasesize >> SECTOR_SHIFT;
+	u32 open_zones;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = blkdev_zone_mgmt(sb->s_bdev, REQ_OP_ZONE_FINISH,
+				zone_sector, zone_size, GFP_NOFS);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to open zone: "
+			  "zone_sector %llu, zone_size %llu, err %d\n",
+			  zone_sector, zone_size, err);
+		return err;
+	}
+
+	open_zones = atomic_dec_return(&fsi->open_zones);
+	if (open_zones > fsi->max_open_zones) {
+		SSDFS_WARN("open zones limit exhausted: "
+			   "open_zones %u\n", open_zones);
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_zns_zone_size() - retrieve zone size
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to retrieve zone size.
+ */
+u64 ssdfs_zns_zone_size(struct super_block *sb, loff_t offset)
+{
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	int res;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+		return U64_MAX;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+		  "type %#x, cond %#x, non_seq %#x, "
+		  "reset %#x, capacity %llu\n",
+		  zone.start, zone.len, zone.wp,
+		  zone.type, zone.cond, zone.non_seq,
+		  zone.reset, zone.capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (u64)zone.len << SECTOR_SHIFT;
+}
+
+/*
+ * ssdfs_zns_zone_capacity() - retrieve zone capacity
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to retrieve zone capacity.
+ */
+u64 ssdfs_zns_zone_capacity(struct super_block *sb, loff_t offset)
+{
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	int res;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+		return U64_MAX;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+		  "type %#x, cond %#x, non_seq %#x, "
+		  "reset %#x, capacity %llu\n",
+		  zone.start, zone.len, zone.wp,
+		  zone.type, zone.cond, zone.non_seq,
+		  zone.reset, zone.capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (u64)zone.capacity << SECTOR_SHIFT;
+}
+
+/*
+ * ssdfs_zns_sync_page_request() - submit page request
+ * @sb: superblock object
+ * @page: memory page
+ * @zone_start: first sector of zone
+ * @offset: offset in bytes from partition's begin
+ * @op: direction of I/O
+ * @op_flags: request op flags
+ */
+static int ssdfs_zns_sync_page_request(struct super_block *sb,
+					struct page *page,
+					sector_t zone_start,
+					loff_t offset,
+					unsigned int op, int op_flags)
+{
+	struct bio *bio;
+#ifdef CONFIG_SSDFS_DEBUG
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	struct blk_zone zone;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+	op |= REQ_OP_ZONE_APPEND | REQ_IDLE;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!page);
+
+	SSDFS_DBG("offset %llu, zone_start %llu, "
+		  "op %#x, op_flags %#x\n",
+		  offset, zone_start, op, op_flags);
+
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+
+	BUG_ON(zone_start != zone.start);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	bio = ssdfs_bdev_bio_alloc(sb->s_bdev, 1, op, GFP_NOFS);
+	if (IS_ERR_OR_NULL(bio)) {
+		err = !bio ? -ERANGE : PTR_ERR(bio);
+		SSDFS_ERR("fail to allocate bio: err %d\n",
+			  err);
+		return err;
+	}
+
+	bio->bi_iter.bi_sector = zone_start;
+	bio_set_dev(bio, sb->s_bdev);
+	bio->bi_opf = op | op_flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_bdev_bio_add_page(bio, page, PAGE_SIZE, 0);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to add page into bio: "
+			  "err %d\n",
+			  err);
+		goto finish_sync_page_request;
+	}
+
+	err = submit_bio_wait(bio);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to process request: "
+			  "err %d\n",
+			  err);
+		goto finish_sync_page_request;
+	}
+
+finish_sync_page_request:
+	ssdfs_bdev_bio_put(bio);
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_sync_pvec_request() - submit pagevec request
+ * @sb: superblock object
+ * @pvec: pagevec
+ * @zone_start: first sector of zone
+ * @offset: offset in bytes from partition's begin
+ * @op: direction of I/O
+ * @op_flags: request op flags
+ */
+static int ssdfs_zns_sync_pvec_request(struct super_block *sb,
+					struct pagevec *pvec,
+					sector_t zone_start,
+					loff_t offset,
+					unsigned int op, int op_flags)
+{
+	struct bio *bio;
+	int i;
+#ifdef CONFIG_SSDFS_DEBUG
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	struct blk_zone zone;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+	op |= REQ_OP_ZONE_APPEND | REQ_IDLE;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pvec);
+
+	SSDFS_DBG("offset %llu, zone_start %llu, "
+		  "op %#x, op_flags %#x\n",
+		  offset, zone_start, op, op_flags);
+
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+
+	BUG_ON(zone_start != zone.start);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (pagevec_count(pvec) == 0) {
+		SSDFS_WARN("empty page vector\n");
+		return 0;
+	}
+
+	bio = ssdfs_bdev_bio_alloc(sb->s_bdev, pagevec_count(pvec),
+				   op, GFP_NOFS);
+	if (IS_ERR_OR_NULL(bio)) {
+		err = !bio ? -ERANGE : PTR_ERR(bio);
+		SSDFS_ERR("fail to allocate bio: err %d\n",
+			  err);
+		return err;
+	}
+
+	bio->bi_iter.bi_sector = zone_start;
+	bio_set_dev(bio, sb->s_bdev);
+	bio->bi_opf = op | op_flags;
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		struct page *page = pvec->pages[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!page);
+
+		SSDFS_DBG("page %p, count %d\n",
+			  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_bdev_bio_add_page(bio, page,
+					      PAGE_SIZE,
+					      0);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to add page %d into bio: "
+				  "err %d\n",
+				  i, err);
+			goto finish_sync_pvec_request;
+		}
+	}
+
+	err = submit_bio_wait(bio);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to process request: "
+			  "err %d\n",
+			  err);
+		goto finish_sync_pvec_request;
+	}
+
+finish_sync_pvec_request:
+	ssdfs_bdev_bio_put(bio);
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_readpage() - read page from the volume
+ * @sb: superblock object
+ * @page: memory page
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to read data on @offset
+ * from partition's begin in memory page.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+int ssdfs_zns_readpage(struct super_block *sb, struct page *page,
+			loff_t offset)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_bdev_readpage(sb, page, offset);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_readpages() - read pages from the volume
+ * @sb: superblock object
+ * @pvec: pagevec
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to read data on @offset
+ * from partition's begin in memory page.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+int ssdfs_zns_readpages(struct super_block *sb, struct pagevec *pvec,
+			 loff_t offset)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_bdev_readpages(sb, pvec, offset);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_read() - read from volume into buffer
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size of buffer in bytes
+ * @buf: buffer
+ *
+ * This function tries to read data on @offset
+ * from partition's begin with @len bytes in size
+ * from the volume into @buf.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+int ssdfs_zns_read(struct super_block *sb, loff_t offset,
+		   size_t len, void *buf)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, len %zu, buf %p\n",
+		  sb, (unsigned long long)offset, len, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_bdev_read(sb, offset, len, buf);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_can_write_page() - check that page can be written
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @need_check: make check or not?
+ *
+ * This function checks that page can be written.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-ENOMEM      - fail to allocate memory.
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_zns_can_write_page(struct super_block *sb, loff_t offset,
+				    bool need_check)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	sector_t zone_size = fsi->erasesize >> SECTOR_SHIFT;
+	u64 peb_id;
+	loff_t zone_offset;
+	int res;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, need_check %d\n",
+		  sb, (unsigned long long)offset, (int)need_check);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!need_check)
+		return 0;
+
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+		return res;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("zone before: start %llu, len %llu, wp %llu, "
+		  "type %#x, cond %#x, non_seq %#x, "
+		  "reset %#x, capacity %llu\n",
+		  zone.start, zone.len, zone.wp,
+		  zone.type, zone.cond, zone.non_seq,
+		  zone.reset, zone.capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (zone.type) {
+	case BLK_ZONE_TYPE_CONVENTIONAL:
+		return ssdfs_bdev_can_write_page(sb, offset, need_check);
+
+	default:
+		/*
+		 * BLK_ZONE_TYPE_SEQWRITE_REQ
+		 * BLK_ZONE_TYPE_SEQWRITE_PREF
+		 *
+		 * continue logic
+		 */
+		break;
+	}
+
+	switch (zone.cond) {
+	case BLK_ZONE_COND_NOT_WP:
+		return ssdfs_bdev_can_write_page(sb, offset, need_check);
+
+	case BLK_ZONE_COND_EMPTY:
+		/* can write */
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is empty: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return 0;
+
+	case BLK_ZONE_COND_CLOSED:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is closed: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		peb_id = offset / fsi->erasesize;
+		zone_offset = peb_id * fsi->erasesize;
+
+		err = ssdfs_zns_reopen_zone(sb, zone_offset);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to reopen zone: "
+				  "zone_offset %llu, zone_size %llu, "
+				  "err %d\n",
+				  zone_offset, zone_size, err);
+			return err;
+		}
+
+		return 0;
+
+	case BLK_ZONE_COND_READONLY:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is READ-ONLY: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_FULL:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is full: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_OFFLINE:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is offline: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	default:
+		/* continue logic */
+		break;
+	}
+
+	if (zone_sector < zone.wp) {
+		err = -EIO;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("cannot be written: "
+			  "zone_sector %llu, zone.wp %llu\n",
+			  zone_sector, zone.wp);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone after: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_writepage() - write memory page on volume
+ * @sb: superblock object
+ * @to_off: offset in bytes from partition's begin
+ * @page: memory page
+ * @from_off: offset in bytes from page's begin
+ * @len: size of data in bytes
+ *
+ * This function tries to write from @page data of @len size
+ * on @offset from partition's begin in memory page.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EIO         - I/O error.
+ */
+int ssdfs_zns_writepage(struct super_block *sb, loff_t to_off,
+			struct page *page, u32 from_off, size_t len)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	loff_t zone_start;
+#ifdef CONFIG_SSDFS_DEBUG
+	struct blk_zone zone;
+	sector_t zone_sector = to_off >> SECTOR_SHIFT;
+	u32 remainder;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, to_off %llu, page %p, from_off %u, len %zu\n",
+		  sb, to_off, page, from_off, len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY) {
+		SSDFS_WARN("unable to write on RO file system\n");
+		return -EROFS;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!page);
+	BUG_ON((to_off >= ssdfs_zns_device_size(sb)) ||
+		(len > (ssdfs_zns_device_size(sb) - to_off)));
+	BUG_ON(len == 0);
+	div_u64_rem((u64)to_off, (u64)fsi->pagesize, &remainder);
+	BUG_ON(remainder);
+	BUG_ON((from_off + len) > PAGE_SIZE);
+	BUG_ON(!PageDirty(page));
+	BUG_ON(PageLocked(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_lock_page(page);
+	atomic_inc(&fsi->pending_bios);
+
+	zone_start = (to_off / fsi->erasesize) * fsi->erasesize;
+	zone_start >>= SECTOR_SHIFT;
+
+	err = ssdfs_zns_sync_page_request(sb, page, zone_start, to_off,
+					  REQ_OP_WRITE, REQ_SYNC);
+	if (err) {
+		SetPageError(page);
+		SSDFS_ERR("failed to write (err %d): offset %llu\n",
+			  err, (unsigned long long)to_off);
+	} else {
+		ssdfs_clear_dirty_page(page);
+		SetPageUptodate(page);
+		ClearPageError(page);
+	}
+
+	ssdfs_unlock_page(page);
+	ssdfs_put_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (atomic_dec_and_test(&fsi->pending_bios))
+		wake_up_all(&zns_wq);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_writepages() - write pagevec on volume
+ * @sb: superblock object
+ * @to_off: offset in bytes from partition's begin
+ * @pvec: memory pages vector
+ * @from_off: offset in bytes from page's begin
+ * @len: size of data in bytes
+ *
+ * This function tries to write from @pvec data of @len size
+ * on @offset from partition's begin.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EIO         - I/O error.
+ */
+int ssdfs_zns_writepages(struct super_block *sb, loff_t to_off,
+			 struct pagevec *pvec,
+			 u32 from_off, size_t len)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct page *page;
+	loff_t zone_start;
+	int i;
+#ifdef CONFIG_SSDFS_DEBUG
+	struct blk_zone zone;
+	sector_t zone_sector = to_off >> SECTOR_SHIFT;
+	u32 remainder;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, to_off %llu, pvec %p, from_off %u, len %zu\n",
+		  sb, to_off, pvec, from_off, len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY) {
+		SSDFS_WARN("unable to write on RO file system\n");
+		return -EROFS;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pvec);
+	BUG_ON((to_off >= ssdfs_zns_device_size(sb)) ||
+		(len > (ssdfs_zns_device_size(sb) - to_off)));
+	BUG_ON(len == 0);
+	div_u64_rem((u64)to_off, (u64)fsi->pagesize, &remainder);
+	BUG_ON(remainder);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (pagevec_count(pvec) == 0) {
+		SSDFS_WARN("empty pagevec\n");
+		return 0;
+	}
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		page = pvec->pages[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!page);
+		BUG_ON(!PageDirty(page));
+		BUG_ON(PageLocked(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_lock_page(page);
+	}
+
+	atomic_inc(&fsi->pending_bios);
+
+	zone_start = (to_off / fsi->erasesize) * fsi->erasesize;
+	zone_start >>= SECTOR_SHIFT;
+
+	err = ssdfs_zns_sync_pvec_request(sb, pvec, zone_start, to_off,
+					  REQ_OP_WRITE, REQ_SYNC);
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		page = pvec->pages[i];
+
+		if (err) {
+			SetPageError(page);
+			SSDFS_ERR("failed to write (err %d): "
+				  "page_index %llu\n",
+				  err,
+				  (unsigned long long)page_index(page));
+		} else {
+			ssdfs_clear_dirty_page(page);
+			SetPageUptodate(page);
+			ClearPageError(page);
+		}
+
+		ssdfs_unlock_page(page);
+		ssdfs_put_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page %p, count %d\n",
+			  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+	if (atomic_dec_and_test(&fsi->pending_bios))
+		wake_up_all(&zns_wq);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_trim() - initiate background erase operation
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size in bytes
+ *
+ * This function tries to initiate background erase operation.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EFAULT      - erase operation error.
+ */
+static int ssdfs_zns_trim(struct super_block *sb, loff_t offset, size_t len)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	u32 erase_size = fsi->erasesize;
+	loff_t page_start, page_end;
+	u32 pages_count;
+	u32 remainder;
+	sector_t start_sector;
+	sector_t sectors_count;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, len %zu\n",
+		  sb, (unsigned long long)offset, len);
+
+	div_u64_rem((u64)len, (u64)erase_size, &remainder);
+	BUG_ON(remainder);
+	div_u64_rem((u64)offset, (u64)erase_size, &remainder);
+	BUG_ON(remainder);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY)
+		return -EROFS;
+
+	div_u64_rem((u64)len, (u64)erase_size, &remainder);
+	if (remainder) {
+		SSDFS_WARN("len %llu, erase_size %u, remainder %u\n",
+			   (unsigned long long)len,
+			   erase_size, remainder);
+		return -ERANGE;
+	}
+
+	page_start = offset >> PAGE_SHIFT;
+	page_end = (offset + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	pages_count = (u32)(page_end - page_start);
+
+	if (pages_count == 0) {
+		SSDFS_WARN("pages_count equals to zero\n");
+		return -ERANGE;
+	}
+
+	start_sector = offset >> SECTOR_SHIFT;
+	sectors_count = fsi->erasesize >> SECTOR_SHIFT;
+
+	err = blkdev_zone_mgmt(sb->s_bdev, REQ_OP_ZONE_RESET,
+				start_sector, sectors_count, GFP_NOFS);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to reset zone: "
+			  "zone_sector %llu, zone_size %llu, err %d\n",
+			  start_sector, sectors_count, err);
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_zns_peb_isbad() - check that PEB is bad
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to detect that PEB is bad or not.
+ */
+static int ssdfs_zns_peb_isbad(struct super_block *sb, loff_t offset)
+{
+	/* do nothing */
+	return 0;
+}
+
+/*
+ * ssdfs_zns_mark_peb_bad() - mark PEB as bad
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to mark PEB as bad.
+ */
+int ssdfs_zns_mark_peb_bad(struct super_block *sb, loff_t offset)
+{
+	/* do nothing */
+	return 0;
+}
+
+/*
+ * ssdfs_zns_sync() - make sync operation
+ * @sb: superblock object
+ */
+static void ssdfs_zns_sync(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("device %s\n", sb->s_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	wait_event(zns_wq, atomic_read(&fsi->pending_bios) == 0);
+}
+
+const struct ssdfs_device_ops ssdfs_zns_devops = {
+	.device_name		= ssdfs_zns_device_name,
+	.device_size		= ssdfs_zns_device_size,
+	.open_zone		= ssdfs_zns_open_zone,
+	.reopen_zone		= ssdfs_zns_reopen_zone,
+	.close_zone		= ssdfs_zns_close_zone,
+	.read			= ssdfs_zns_read,
+	.readpage		= ssdfs_zns_readpage,
+	.readpages		= ssdfs_zns_readpages,
+	.can_write_page		= ssdfs_zns_can_write_page,
+	.writepage		= ssdfs_zns_writepage,
+	.writepages		= ssdfs_zns_writepages,
+	.erase			= ssdfs_zns_trim,
+	.trim			= ssdfs_zns_trim,
+	.peb_isbad		= ssdfs_zns_peb_isbad,
+	.mark_peb_bad		= ssdfs_zns_mark_peb_bad,
+	.sync			= ssdfs_zns_sync,
+};
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH 04/76] ssdfs: implement super operations
  2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
                   ` (2 preceding siblings ...)
  2023-02-25  1:08 ` [RFC PATCH 03/76] ssdfs: implement raw device operations Viacheslav Dubeyko
@ 2023-02-25  1:08 ` Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 05/76] ssdfs: implement commit superblock operation Viacheslav Dubeyko
                   ` (72 subsequent siblings)
  76 siblings, 0 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

Patch implements register/unregister file system logic.
The register FS logic includes caches creation/initialization,
compression support initialization, sysfs subsystem
initialization. Oppositely, unregister FS logic executes
destruction of caches, compression subsystem, and sysfs
entries.

Also, patch implements basic mount/unmount logic.
The ssdfs_fill_super() implements mount logic that includes:
(1) parsing mount options,
(2) extract superblock info,
(3) create key in-core metadata structures (mapping table,
    segment bitmap, b-trees),
(4) create root inode,
(5) start metadata structures' threads,
(6) commit superblock on finish of mount operation.

The ssdfs_put_super() implements unmount logic:
(1) stop metadata threads,
(2) wait unfinished user data requests,
(3) flush dirty metadata structures,
(4) commit superblock,
(5) destroy in-core metadata structures.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
CC: Viacheslav Dubeyko <viacheslav.dubeyko@bytedance.com>
CC: Luka Perkov <luka.perkov@sartura.hr>
CC: Bruno Banelli <bruno.banelli@sartura.hr>
---
 fs/ssdfs/fs_error.c  |  257 ++++++
 fs/ssdfs/options.c   |  190 +++++
 fs/ssdfs/readwrite.c |  651 +++++++++++++++
 fs/ssdfs/super.c     | 1844 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 2942 insertions(+)
 create mode 100644 fs/ssdfs/fs_error.c
 create mode 100644 fs/ssdfs/options.c
 create mode 100644 fs/ssdfs/readwrite.c
 create mode 100644 fs/ssdfs/super.c

diff --git a/fs/ssdfs/fs_error.c b/fs/ssdfs/fs_error.c
new file mode 100644
index 000000000000..452ace18272d
--- /dev/null
+++ b/fs/ssdfs/fs_error.c
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/fs_error.c - logic for the case of file system errors detection.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/page-flags.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_fs_error_page_leaks;
+atomic64_t ssdfs_fs_error_memory_leaks;
+atomic64_t ssdfs_fs_error_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_fs_error_cache_leaks_increment(void *kaddr)
+ * void ssdfs_fs_error_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_fs_error_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_fs_error_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_fs_error_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_fs_error_kfree(void *kaddr)
+ * struct page *ssdfs_fs_error_alloc_page(gfp_t gfp_mask)
+ * struct page *ssdfs_fs_error_add_pagevec_page(struct pagevec *pvec)
+ * void ssdfs_fs_error_free_page(struct page *page)
+ * void ssdfs_fs_error_pagevec_release(struct pagevec *pvec)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(fs_error)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(fs_error)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_fs_error_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_fs_error_page_leaks, 0);
+	atomic64_set(&ssdfs_fs_error_memory_leaks, 0);
+	atomic64_set(&ssdfs_fs_error_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_fs_error_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_fs_error_page_leaks) != 0) {
+		SSDFS_ERR("FS ERROR: "
+			  "memory leaks include %lld pages\n",
+			  atomic64_read(&ssdfs_fs_error_page_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_fs_error_memory_leaks) != 0) {
+		SSDFS_ERR("FS ERROR: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_fs_error_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_fs_error_cache_leaks) != 0) {
+		SSDFS_ERR("FS ERROR: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_fs_error_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static void ssdfs_handle_error(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+
+	if (sb->s_flags & SB_RDONLY)
+		return;
+
+	spin_lock(&fsi->volume_state_lock);
+	fsi->fs_state = SSDFS_ERROR_FS;
+	spin_unlock(&fsi->volume_state_lock);
+
+	if (ssdfs_test_opt(fsi->mount_opts, ERRORS_PANIC)) {
+		panic("SSDFS (device %s): panic forced after error\n",
+			fsi->devops->device_name(sb));
+	} else if (ssdfs_test_opt(fsi->mount_opts, ERRORS_RO)) {
+		SSDFS_CRIT("Remounting filesystem read-only\n");
+		/*
+		 * Make sure updated value of ->s_mount_flags will be visible
+		 * before ->s_flags update
+		 */
+		smp_wmb();
+		sb->s_flags |= SB_RDONLY;
+	}
+}
+
+void ssdfs_fs_error(struct super_block *sb, const char *file,
+		    const char *function, unsigned int line,
+		    const char *fmt, ...)
+{
+	struct va_format vaf;
+	va_list args;
+
+	va_start(args, fmt);
+	vaf.fmt = fmt;
+	vaf.va = &args;
+	pr_crit("SSDFS error (device %s): pid %d:%s:%d %s(): comm %s: %pV",
+		SSDFS_FS_I(sb)->devops->device_name(sb), current->pid,
+		file, line, function, current->comm, &vaf);
+	va_end(args);
+
+	ssdfs_handle_error(sb);
+}
+
+int ssdfs_set_page_dirty(struct page *page)
+{
+	struct address_space *mapping = page->mapping;
+	unsigned long flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page_index: %llu, mapping %p\n",
+		  (u64)page_index(page), mapping);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!PageLocked(page)) {
+		SSDFS_WARN("page isn't locked: "
+			   "page_index %llu, mapping %p\n",
+			   (u64)page_index(page), mapping);
+		return -ERANGE;
+	}
+
+	SetPageDirty(page);
+
+	if (mapping) {
+		xa_lock_irqsave(&mapping->i_pages, flags);
+		__xa_set_mark(&mapping->i_pages, page_index(page),
+				PAGECACHE_TAG_DIRTY);
+		xa_unlock_irqrestore(&mapping->i_pages, flags);
+	}
+
+	return 0;
+}
+
+int __ssdfs_clear_dirty_page(struct page *page)
+{
+	struct address_space *mapping = page->mapping;
+	unsigned long flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page_index: %llu, mapping %p\n",
+		  (u64)page_index(page), mapping);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!PageLocked(page)) {
+		SSDFS_WARN("page isn't locked: "
+			   "page_index %llu, mapping %p\n",
+			   (u64)page_index(page), mapping);
+		return -ERANGE;
+	}
+
+	if (mapping) {
+		xa_lock_irqsave(&mapping->i_pages, flags);
+		if (test_bit(PG_dirty, &page->flags)) {
+			__xa_clear_mark(&mapping->i_pages,
+					page_index(page),
+					PAGECACHE_TAG_DIRTY);
+		}
+		xa_unlock_irqrestore(&mapping->i_pages, flags);
+	}
+
+	TestClearPageDirty(page);
+
+	return 0;
+}
+
+int ssdfs_clear_dirty_page(struct page *page)
+{
+	struct address_space *mapping = page->mapping;
+	unsigned long flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page_index: %llu, mapping %p\n",
+		  (u64)page_index(page), mapping);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!PageLocked(page)) {
+		SSDFS_WARN("page isn't locked: "
+			   "page_index %llu, mapping %p\n",
+			   (u64)page_index(page), mapping);
+		return -ERANGE;
+	}
+
+	if (mapping) {
+		xa_lock_irqsave(&mapping->i_pages, flags);
+		if (test_bit(PG_dirty, &page->flags)) {
+			__xa_clear_mark(&mapping->i_pages,
+					page_index(page),
+					PAGECACHE_TAG_DIRTY);
+			xa_unlock_irqrestore(&mapping->i_pages, flags);
+			return clear_page_dirty_for_io(page);
+		}
+		xa_unlock_irqrestore(&mapping->i_pages, flags);
+		return 0;
+	}
+
+	TestClearPageDirty(page);
+
+	return 0;
+}
+
+/*
+ * ssdfs_clear_dirty_pages - discard dirty pages in address space
+ * @mapping: address space with dirty pages for discarding
+ */
+void ssdfs_clear_dirty_pages(struct address_space *mapping)
+{
+	struct pagevec pvec;
+	unsigned int i;
+	pgoff_t index = 0;
+	int err;
+
+	pagevec_init(&pvec);
+
+	while (pagevec_lookup_tag(&pvec, mapping, &index,
+				  PAGECACHE_TAG_DIRTY)) {
+		for (i = 0; i < pagevec_count(&pvec); i++) {
+			struct page *page = pvec.pages[i];
+
+			ssdfs_lock_page(page);
+			err = ssdfs_clear_dirty_page(page);
+			ssdfs_unlock_page(page);
+
+			if (unlikely(err)) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("fail clear page dirty: "
+					  "page_index %llu\n",
+					  (u64)page_index(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+		}
+		ssdfs_fs_error_pagevec_release(&pvec);
+		cond_resched();
+	}
+}
diff --git a/fs/ssdfs/options.c b/fs/ssdfs/options.c
new file mode 100644
index 000000000000..e36870868c08
--- /dev/null
+++ b/fs/ssdfs/options.c
@@ -0,0 +1,190 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/options.c - mount options parsing.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/string.h>
+#include <linux/kernel.h>
+#include <linux/parser.h>
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/seq_file.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+#include "segment_bitmap.h"
+
+/*
+ * SSDFS mount options.
+ *
+ * Opt_compr: change default compressor
+ * Opt_fs_err_panic: panic if fs error is detected
+ * Opt_fs_err_ro: remount in RO state if fs error is detected
+ * Opt_fs_err_cont: continue execution if fs error is detected
+ * Opt_ignore_fs_state: ignore on-disk file system state during mount
+ * Opt_err: just end of array marker
+ */
+enum {
+	Opt_compr,
+	Opt_fs_err_panic,
+	Opt_fs_err_ro,
+	Opt_fs_err_cont,
+	Opt_ignore_fs_state,
+	Opt_err,
+};
+
+static const match_table_t tokens = {
+	{Opt_compr, "compr=%s"},
+	{Opt_fs_err_panic, "errors=panic"},
+	{Opt_fs_err_ro, "errors=remount-ro"},
+	{Opt_fs_err_cont, "errors=continue"},
+	{Opt_ignore_fs_state, "fs_state=ignore"},
+	{Opt_err, NULL},
+};
+
+int ssdfs_parse_options(struct ssdfs_fs_info *fs_info, char *data)
+{
+	substring_t args[MAX_OPT_ARGS];
+	char *p, *name;
+
+	if (!data)
+		return 0;
+
+	while ((p = strsep(&data, ","))) {
+		int token;
+
+		if (!*p)
+			continue;
+
+		token = match_token(p, tokens, args);
+		switch (token) {
+		case Opt_compr:
+			name = match_strdup(&args[0]);
+
+			if (!name)
+				return -ENOMEM;
+			if (!strcmp(name, "none"))
+				ssdfs_set_opt(fs_info->mount_opts,
+						COMPR_MODE_NONE);
+#ifdef CONFIG_SSDFS_ZLIB
+			else if (!strcmp(name, "zlib"))
+				ssdfs_set_opt(fs_info->mount_opts,
+						COMPR_MODE_ZLIB);
+#endif
+#ifdef CONFIG_SSDFS_LZO
+			else if (!strcmp(name, "lzo"))
+				ssdfs_set_opt(fs_info->mount_opts,
+						COMPR_MODE_LZO);
+#endif
+			else {
+				SSDFS_ERR("unknown compressor %s\n", name);
+				ssdfs_kfree(name);
+				return -EINVAL;
+			}
+			ssdfs_kfree(name);
+			break;
+
+		case Opt_fs_err_panic:
+			/* Clear possible default initialization */
+			ssdfs_clear_opt(fs_info->mount_opts, ERRORS_RO);
+			ssdfs_clear_opt(fs_info->mount_opts, ERRORS_CONT);
+			ssdfs_set_opt(fs_info->mount_opts, ERRORS_PANIC);
+			break;
+
+		case Opt_fs_err_ro:
+			/* Clear possible default initialization */
+			ssdfs_clear_opt(fs_info->mount_opts, ERRORS_PANIC);
+			ssdfs_clear_opt(fs_info->mount_opts, ERRORS_CONT);
+			ssdfs_set_opt(fs_info->mount_opts, ERRORS_RO);
+			break;
+
+		case Opt_fs_err_cont:
+			/* Clear possible default initialization */
+			ssdfs_clear_opt(fs_info->mount_opts, ERRORS_PANIC);
+			ssdfs_clear_opt(fs_info->mount_opts, ERRORS_RO);
+			ssdfs_set_opt(fs_info->mount_opts, ERRORS_CONT);
+			break;
+
+		case Opt_ignore_fs_state:
+			ssdfs_set_opt(fs_info->mount_opts, IGNORE_FS_STATE);
+			break;
+
+		default:
+			SSDFS_ERR("unrecognized mount option '%s'\n", p);
+			return -EINVAL;
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("DONE: parse options\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+void ssdfs_initialize_fs_errors_option(struct ssdfs_fs_info *fsi)
+{
+	if (fsi->fs_errors == SSDFS_ERRORS_PANIC)
+		ssdfs_set_opt(fsi->mount_opts, ERRORS_PANIC);
+	else if (fsi->fs_errors == SSDFS_ERRORS_RO)
+		ssdfs_set_opt(fsi->mount_opts, ERRORS_RO);
+	else if (fsi->fs_errors == SSDFS_ERRORS_CONTINUE)
+		ssdfs_set_opt(fsi->mount_opts, ERRORS_CONT);
+	else {
+		u16 def_behaviour = SSDFS_ERRORS_DEFAULT;
+
+		switch (def_behaviour) {
+		case SSDFS_ERRORS_PANIC:
+			ssdfs_set_opt(fsi->mount_opts, ERRORS_PANIC);
+			break;
+
+		case SSDFS_ERRORS_RO:
+			ssdfs_set_opt(fsi->mount_opts, ERRORS_RO);
+			break;
+		}
+	}
+}
+
+int ssdfs_show_options(struct seq_file *seq, struct dentry *root)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(root->d_sb);
+	char *compress_type;
+
+	if (ssdfs_test_opt(fsi->mount_opts, COMPR_MODE_ZLIB)) {
+		compress_type = "zlib";
+		seq_printf(seq, ",compress=%s", compress_type);
+	} else if (ssdfs_test_opt(fsi->mount_opts, COMPR_MODE_LZO)) {
+		compress_type = "lzo";
+		seq_printf(seq, ",compress=%s", compress_type);
+	}
+
+	if (ssdfs_test_opt(fsi->mount_opts, ERRORS_PANIC))
+		seq_puts(seq, ",errors=panic");
+	else if (ssdfs_test_opt(fsi->mount_opts, ERRORS_RO))
+		seq_puts(seq, ",errors=remount-ro");
+	else if (ssdfs_test_opt(fsi->mount_opts, ERRORS_CONT))
+		seq_puts(seq, ",errors=continue");
+
+	if (ssdfs_test_opt(fsi->mount_opts, IGNORE_FS_STATE))
+		seq_puts(seq, ",fs_state=ignore");
+
+	return 0;
+}
diff --git a/fs/ssdfs/readwrite.c b/fs/ssdfs/readwrite.c
new file mode 100644
index 000000000000..b47cef995e4b
--- /dev/null
+++ b/fs/ssdfs/readwrite.c
@@ -0,0 +1,651 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/readwrite.c - read/write primitive operations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+#include <linux/rwsem.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+
+#include <trace/events/ssdfs.h>
+
+/*
+ * ssdfs_read_page_from_volume() - read page from volume
+ * @fsi: pointer on shared file system object
+ * @peb_id: PEB identification number
+ * @bytes_off: offset from PEB's begining in bytes
+ * @page: memory page
+ *
+ * This function tries to read page from the volume.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ * %-EIO        - I/O error.
+ */
+int ssdfs_read_page_from_volume(struct ssdfs_fs_info *fsi,
+				u64 peb_id, u32 bytes_off,
+				struct page *page)
+{
+	struct super_block *sb;
+	loff_t offset;
+	u32 peb_size;
+	u32 pagesize;
+	u32 pages_per_peb;
+	u32 pages_off;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !page);
+	BUG_ON(!fsi->devops->readpage);
+
+	SSDFS_DBG("fsi %p, peb_id %llu, bytes_off %u, page %p\n",
+		  fsi, peb_id, bytes_off, page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	sb = fsi->sb;
+	pagesize = fsi->pagesize;
+	pages_per_peb = fsi->pages_per_peb;
+	pages_off = bytes_off / pagesize;
+
+	if (pages_per_peb >= (U32_MAX / pagesize)) {
+		SSDFS_ERR("pages_per_peb %u >= U32_MAX / pagesize %u\n",
+			  pages_per_peb, pagesize);
+		return -EINVAL;
+	}
+
+	peb_size = pages_per_peb * pagesize;
+
+	if (peb_id >= div_u64(ULLONG_MAX, peb_size)) {
+		SSDFS_ERR("peb_id %llu >= ULLONG_MAX / peb_size %u\n",
+			  peb_id, peb_size);
+		return -EINVAL;
+	}
+
+	offset = peb_id * peb_size;
+
+	if (pages_off >= pages_per_peb) {
+		SSDFS_ERR("pages_off %u >= pages_per_peb %u\n",
+			  pages_off, pages_per_peb);
+		return -EINVAL;
+	}
+
+	if (pages_off >= (U32_MAX / pagesize)) {
+		SSDFS_ERR("pages_off %u >= U32_MAX / pagesize %u\n",
+			  pages_off, fsi->pagesize);
+		return -EINVAL;
+	}
+
+	offset += bytes_off;
+
+	if (fsi->devops->peb_isbad) {
+		err = fsi->devops->peb_isbad(sb, offset);
+		if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("offset %llu is in bad PEB: err %d\n",
+				  (unsigned long long)offset, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return -EIO;
+		}
+	}
+
+	err = fsi->devops->readpage(sb, page, offset);
+	if (unlikely(err)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("fail to read page: offset %llu, err %d\n",
+			  (unsigned long long)offset, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_read_pagevec_from_volume() - read pagevec from volume
+ * @fsi: pointer on shared file system object
+ * @peb_id: PEB identification number
+ * @bytes_off: offset from PEB's begining in bytes
+ * @pvec: pagevec [in|out]
+ *
+ * This function tries to read pages from the volume.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ * %-EIO        - I/O error.
+ */
+int ssdfs_read_pagevec_from_volume(struct ssdfs_fs_info *fsi,
+				   u64 peb_id, u32 bytes_off,
+				   struct pagevec *pvec)
+{
+	struct super_block *sb;
+	loff_t offset;
+	u32 peb_size;
+	u32 pagesize;
+	u32 pages_per_peb;
+	u32 pages_off;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !pvec);
+	BUG_ON(!fsi->devops->readpages);
+
+	SSDFS_DBG("fsi %p, peb_id %llu, bytes_off %u, pvec %p\n",
+		  fsi, peb_id, bytes_off, pvec);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	sb = fsi->sb;
+	pagesize = fsi->pagesize;
+	pages_per_peb = fsi->pages_per_peb;
+	pages_off = bytes_off / pagesize;
+
+	if (pages_per_peb >= (U32_MAX / pagesize)) {
+		SSDFS_ERR("pages_per_peb %u >= U32_MAX / pagesize %u\n",
+			  pages_per_peb, pagesize);
+		return -EINVAL;
+	}
+
+	peb_size = pages_per_peb * pagesize;
+
+	if (peb_id >= div_u64(ULLONG_MAX, peb_size)) {
+		SSDFS_ERR("peb_id %llu >= ULLONG_MAX / peb_size %u\n",
+			  peb_id, peb_size);
+		return -EINVAL;
+	}
+
+	offset = peb_id * peb_size;
+
+	if (pages_off >= pages_per_peb) {
+		SSDFS_ERR("pages_off %u >= pages_per_peb %u\n",
+			  pages_off, pages_per_peb);
+		return -EINVAL;
+	}
+
+	if (pages_off >= (U32_MAX / pagesize)) {
+		SSDFS_ERR("pages_off %u >= U32_MAX / pagesize %u\n",
+			  pages_off, fsi->pagesize);
+		return -EINVAL;
+	}
+
+	offset += bytes_off;
+
+	if (fsi->devops->peb_isbad) {
+		err = fsi->devops->peb_isbad(sb, offset);
+		if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("offset %llu is in bad PEB: err %d\n",
+				  (unsigned long long)offset, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return -EIO;
+		}
+	}
+
+	err = fsi->devops->readpages(sb, pvec, offset);
+	if (unlikely(err)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("fail to read pvec: offset %llu, err %d\n",
+			  (unsigned long long)offset, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_aligned_read_buffer() - aligned read from volume into buffer
+ * @fsi: pointer on shared file system object
+ * @peb_id: PEB identification number
+ * @bytes_off: offset from PEB's begining in bytes
+ * @buf: buffer
+ * @size: buffer size
+ * @read_bytes: really read bytes
+ *
+ * This function tries to read in buffer by means of page aligned
+ * request. It reads part of requested data in the case of unaligned
+ * request. The @read_bytes returns value of really read data.
+ *
+ * RETURN:
+ * [success] - buffer contains data of @read_bytes in size.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ * %-EIO        - I/O error.
+ */
+int ssdfs_aligned_read_buffer(struct ssdfs_fs_info *fsi,
+			      u64 peb_id, u32 bytes_off,
+			      void *buf, size_t size,
+			      size_t *read_bytes)
+{
+	struct super_block *sb;
+	loff_t offset;
+	u32 peb_size;
+	u32 pagesize;
+	u32 pages_per_peb;
+	u32 pages_off;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !buf);
+	BUG_ON(!fsi->devops->read);
+
+	SSDFS_DBG("fsi %p, peb_id %llu, bytes_off %u, buf %p, size %zu\n",
+		  fsi, peb_id, bytes_off, buf, size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	sb = fsi->sb;
+	pagesize = fsi->pagesize;
+	pages_per_peb = fsi->pages_per_peb;
+	pages_off = bytes_off / pagesize;
+
+	if (pages_per_peb >= (U32_MAX / pagesize)) {
+		SSDFS_ERR("pages_per_peb %u >= U32_MAX / pagesize %u\n",
+			  pages_per_peb, pagesize);
+		return -EINVAL;
+	}
+
+	peb_size = pages_per_peb * pagesize;
+
+	if (peb_id >= div_u64(ULLONG_MAX, peb_size)) {
+		SSDFS_ERR("peb_id %llu >= ULLONG_MAX / peb_size %u\n",
+			  peb_id, peb_size);
+		return -EINVAL;
+	}
+
+	offset = peb_id * peb_size;
+
+	if (pages_off >= pages_per_peb) {
+		SSDFS_ERR("pages_off %u >= pages_per_peb %u\n",
+			  pages_off, pages_per_peb);
+		return -EINVAL;
+	}
+
+	if (pages_off >= (U32_MAX / pagesize)) {
+		SSDFS_ERR("pages_off %u >= U32_MAX / pagesize %u\n",
+			  pages_off, fsi->pagesize);
+		return -EINVAL;
+	}
+
+	if (size > pagesize) {
+		SSDFS_ERR("size %zu > pagesize %u\n",
+			  size, fsi->pagesize);
+		return -EINVAL;
+	}
+
+	offset += bytes_off;
+
+	*read_bytes = ((pages_off + 1) * pagesize) - bytes_off;
+	*read_bytes = min_t(size_t, *read_bytes, size);
+
+	if (fsi->devops->peb_isbad) {
+		err = fsi->devops->peb_isbad(sb, offset);
+		if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("offset %llu is in bad PEB: err %d\n",
+				  (unsigned long long)offset, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return -EIO;
+		}
+	}
+
+	err = fsi->devops->read(sb, offset, *read_bytes, buf);
+	if (unlikely(err)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("fail to read from offset %llu, size %zu, err %d\n",
+			  (unsigned long long)offset, *read_bytes, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_unaligned_read_buffer() - unaligned read from volume into buffer
+ * @fsi: pointer on shared file system object
+ * @peb_id: PEB identification number
+ * @bytes_off: offset from PEB's begining in bytes
+ * @buf: buffer
+ * @size: buffer size
+ *
+ * This function tries to read in buffer by means of page unaligned
+ * request.
+ *
+ * RETURN:
+ * [success] - buffer contains data of @size in bytes.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ * %-EIO        - I/O error.
+ */
+int ssdfs_unaligned_read_buffer(struct ssdfs_fs_info *fsi,
+				u64 peb_id, u32 bytes_off,
+				void *buf, size_t size)
+{
+	size_t read_bytes = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !buf);
+	BUG_ON(!fsi->devops->read);
+
+	SSDFS_DBG("fsi %p, peb_id %llu, bytes_off %u, buf %p, size %zu\n",
+		  fsi, peb_id, bytes_off, buf, size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	do {
+		size_t iter_size = size - read_bytes;
+		size_t iter_read_bytes;
+
+		err = ssdfs_aligned_read_buffer(fsi, peb_id,
+						bytes_off + read_bytes,
+						buf + read_bytes,
+						iter_size,
+						&iter_read_bytes);
+		if (err) {
+			SSDFS_ERR("fail to read from peb_id %llu, offset %zu, "
+				  "size %zu, err %d\n",
+				  peb_id, (size_t)(bytes_off + read_bytes),
+				  iter_size, err);
+			return err;
+		}
+
+		read_bytes += iter_read_bytes;
+	} while (read_bytes < size);
+
+	return 0;
+}
+
+/*
+ * ssdfs_can_write_sb_log() - check that superblock log can be written
+ * @sb: pointer on superblock object
+ * @sb_log: superblock log's extent
+ *
+ * This function checks that superblock log can be written
+ * successfully.
+ *
+ * RETURN:
+ * [success] - superblock log can be written successfully.
+ * [failure] - error code:
+ *
+ * %-ERANGE     - invalid extent.
+ */
+int ssdfs_can_write_sb_log(struct super_block *sb,
+			   struct ssdfs_peb_extent *sb_log)
+{
+	struct ssdfs_fs_info *fsi;
+	u64 cur_peb;
+	u32 page_offset;
+	u32 log_size;
+	loff_t byte_off;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !sb_log);
+
+	SSDFS_DBG("leb_id %llu, peb_id %llu, "
+		  "page_offset %u, pages_count %u\n",
+		  sb_log->leb_id, sb_log->peb_id,
+		  sb_log->page_offset, sb_log->pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(sb);
+
+	if (!fsi->devops->can_write_page)
+		return 0;
+
+	cur_peb = sb_log->peb_id;
+	page_offset = sb_log->page_offset;
+	log_size = sb_log->pages_count;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_peb %llu, page_offset %u, "
+		  "log_size %u, pages_per_peb %u\n",
+		  cur_peb, page_offset,
+		  log_size, fsi->pages_per_peb);
+
+	if (log_size > fsi->pages_per_seg) {
+		SSDFS_ERR("log_size value %u is too big\n",
+			  log_size);
+		return -ERANGE;
+	}
+
+	if (cur_peb > div_u64(ULLONG_MAX, fsi->pages_per_seg)) {
+		SSDFS_ERR("cur_peb value %llu is too big\n",
+			  cur_peb);
+		return -ERANGE;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	byte_off = cur_peb * fsi->pages_per_peb;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	if (byte_off > div_u64(ULLONG_MAX, fsi->pagesize)) {
+		SSDFS_ERR("byte_off value %llu is too big\n",
+			  byte_off);
+		return -ERANGE;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	byte_off *= fsi->pagesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	if ((u64)page_offset > div_u64(ULLONG_MAX, fsi->pagesize)) {
+		SSDFS_ERR("page_offset value %u is too big\n",
+			  page_offset);
+		return -ERANGE;
+	}
+
+	if (byte_off > (ULLONG_MAX - ((u64)page_offset * fsi->pagesize))) {
+		SSDFS_ERR("byte_off value %llu is too big\n",
+			  byte_off);
+			return -ERANGE;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	byte_off += (u64)page_offset * fsi->pagesize;
+
+	for (i = 0; i < log_size; i++) {
+#ifdef CONFIG_SSDFS_DEBUG
+		if (byte_off > (ULLONG_MAX - (i * fsi->pagesize))) {
+			SSDFS_ERR("offset value %llu is too big\n",
+				  byte_off);
+			return -ERANGE;
+		}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = fsi->devops->can_write_page(sb, byte_off, true);
+		if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("page can't be written: err %d\n", err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return err;
+		}
+
+		byte_off += fsi->pagesize;
+	}
+
+	return 0;
+}
+
+int ssdfs_unaligned_read_pagevec(struct pagevec *pvec,
+				 u32 offset, u32 size,
+				 void *buf)
+{
+	struct page *page;
+	u32 page_off;
+	u32 bytes_off;
+	size_t read_bytes = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pvec || !buf);
+
+	SSDFS_DBG("pvec %p, offset %u, size %u, buf %p\n",
+		  pvec, offset, size, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	do {
+		size_t iter_read_bytes;
+		size_t cur_off;
+
+		bytes_off = offset + read_bytes;
+		page_off = bytes_off / PAGE_SIZE;
+		cur_off = bytes_off % PAGE_SIZE;
+
+		iter_read_bytes = min_t(size_t,
+					(size_t)(size - read_bytes),
+					(size_t)(PAGE_SIZE - cur_off));
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page_off %u, cur_off %zu, "
+			  "iter_read_bytes %zu\n",
+			  page_off, cur_off,
+			  iter_read_bytes);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (page_off >= pagevec_count(pvec)) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("page out of range: index %u: "
+				  "offset %zu, pagevec_count %u\n",
+				  page_off, cur_off,
+				  pagevec_count(pvec));
+#endif /* CONFIG_SSDFS_DEBUG */
+			return -E2BIG;
+		}
+
+		page = pvec->pages[page_off];
+
+		ssdfs_lock_page(page);
+		err = ssdfs_memcpy_from_page(buf, read_bytes, size,
+					     page, cur_off, PAGE_SIZE,
+					     iter_read_bytes);
+		ssdfs_unlock_page(page);
+
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy: "
+				  "read_bytes %zu, offset %zu, "
+				  "iter_read_bytes %zu, err %d\n",
+				  read_bytes, cur_off,
+				  iter_read_bytes, err);
+			return err;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page %p, count %d\n",
+			  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		read_bytes += iter_read_bytes;
+	} while (read_bytes < size);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("BUF DUMP\n");
+	print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+			     buf, size);
+	SSDFS_DBG("\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+int ssdfs_unaligned_write_pagevec(struct pagevec *pvec,
+				  u32 offset, u32 size,
+				  void *buf)
+{
+	struct page *page;
+	u32 page_off;
+	u32 bytes_off;
+	size_t written_bytes = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pvec || !buf);
+
+	SSDFS_DBG("pvec %p, offset %u, size %u, buf %p\n",
+		  pvec, offset, size, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	do {
+		size_t iter_write_bytes;
+		size_t cur_off;
+
+		bytes_off = offset + written_bytes;
+		page_off = bytes_off / PAGE_SIZE;
+		cur_off = bytes_off % PAGE_SIZE;
+
+		iter_write_bytes = min_t(size_t,
+					(size_t)(size - written_bytes),
+					(size_t)(PAGE_SIZE - cur_off));
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("bytes_off %u, page_off %u, "
+			  "cur_off %zu, written_bytes %zu, "
+			  "iter_write_bytes %zu\n",
+			  bytes_off, page_off, cur_off,
+			  written_bytes, iter_write_bytes);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (page_off >= pagevec_count(pvec)) {
+			SSDFS_ERR("invalid page index %u: "
+				  "offset %zu, pagevec_count %u\n",
+				  page_off, cur_off,
+				  pagevec_count(pvec));
+			return -EINVAL;
+		}
+
+		page = pvec->pages[page_off];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!page);
+		WARN_ON(!PageLocked(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_memcpy_to_page(page, cur_off, PAGE_SIZE,
+					   buf, written_bytes, size,
+					   iter_write_bytes);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy: "
+				  "written_bytes %zu, offset %zu, "
+				  "iter_write_bytes %zu, err %d\n",
+				  written_bytes, cur_off,
+				  iter_write_bytes, err);
+			return err;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page %p, count %d\n",
+			  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		written_bytes += iter_write_bytes;
+	} while (written_bytes < size);
+
+	return 0;
+}
diff --git a/fs/ssdfs/super.c b/fs/ssdfs/super.c
new file mode 100644
index 000000000000..a3b144e6eafb
--- /dev/null
+++ b/fs/ssdfs/super.c
@@ -0,0 +1,1844 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/super.c - module and superblock management.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/mtd/mtd.h>
+#include <linux/mtd/super.h>
+#include <linux/exportfs.h>
+#include <linux/pagevec.h>
+#include <linux/blkdev.h>
+#include <linux/backing-dev.h>
+#include <linux/delay.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+#include "version.h"
+#include "segment_bitmap.h"
+#include "offset_translation_table.h"
+#include "page_array.h"
+#include "page_vector.h"
+#include "peb_container.h"
+#include "segment.h"
+#include "segment_tree.h"
+#include "current_segment.h"
+#include "peb_mapping_table.h"
+#include "extents_queue.h"
+#include "btree_search.h"
+#include "btree_node.h"
+#include "btree.h"
+#include "inodes_tree.h"
+#include "shared_extents_tree.h"
+#include "shared_dictionary.h"
+#include "extents_tree.h"
+#include "dentries_tree.h"
+#include "xattr_tree.h"
+#include "xattr.h"
+#include "acl.h"
+#include "snapshots_tree.h"
+#include "invalidated_extents_tree.h"
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/ssdfs.h>
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_allocated_pages;
+atomic64_t ssdfs_memory_leaks;
+atomic64_t ssdfs_super_page_leaks;
+atomic64_t ssdfs_super_memory_leaks;
+atomic64_t ssdfs_super_cache_leaks;
+
+atomic64_t ssdfs_locked_pages;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_super_cache_leaks_increment(void *kaddr)
+ * void ssdfs_super_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_super_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_super_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_super_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_super_kfree(void *kaddr)
+ * struct page *ssdfs_super_alloc_page(gfp_t gfp_mask)
+ * struct page *ssdfs_super_add_pagevec_page(struct pagevec *pvec)
+ * void ssdfs_super_free_page(struct page *page)
+ * void ssdfs_super_pagevec_release(struct pagevec *pvec)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(super)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(super)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_super_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_super_page_leaks, 0);
+	atomic64_set(&ssdfs_super_memory_leaks, 0);
+	atomic64_set(&ssdfs_super_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_super_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_super_page_leaks) != 0) {
+		SSDFS_ERR("SUPER: "
+			  "memory leaks include %lld pages\n",
+			  atomic64_read(&ssdfs_super_page_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_super_memory_leaks) != 0) {
+		SSDFS_ERR("SUPER: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_super_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_super_cache_leaks) != 0) {
+		SSDFS_ERR("SUPER: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_super_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static void init_once(void *foo)
+{
+	struct ssdfs_inode_info *ii = (struct ssdfs_inode_info *)foo;
+
+	inode_init_once(&ii->vfs_inode);
+}
+
+/*
+ * This method is called by inode_alloc() to allocate memory
+ * for struct inode and initialize it
+ */
+struct inode *ssdfs_alloc_inode(struct super_block *sb)
+{
+	struct ssdfs_inode_info *ii;
+
+	ii = alloc_inode_sb(sb, ssdfs_inode_cachep, GFP_KERNEL);
+	if (!ii)
+		return NULL;
+
+	ssdfs_super_cache_leaks_increment(ii);
+
+	init_once((void *)ii);
+
+	atomic_set(&ii->private_flags, 0);
+	init_rwsem(&ii->lock);
+	ii->parent_ino = U64_MAX;
+	ii->flags = 0;
+	ii->name_hash = 0;
+	ii->name_len = 0;
+	ii->extents_tree = NULL;
+	ii->dentries_tree = NULL;
+	ii->xattrs_tree = NULL;
+	ii->inline_file = NULL;
+	memset(&ii->raw_inode, 0, sizeof(struct ssdfs_inode));
+
+	return &ii->vfs_inode;
+}
+
+static void ssdfs_i_callback(struct rcu_head *head)
+{
+	struct inode *inode = container_of(head, struct inode, i_rcu);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu\n", inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (ii->extents_tree)
+		ssdfs_extents_tree_destroy(ii);
+
+	if (ii->dentries_tree)
+		ssdfs_dentries_tree_destroy(ii);
+
+	if (ii->xattrs_tree)
+		ssdfs_xattrs_tree_destroy(ii);
+
+	if (ii->inline_file)
+		ssdfs_destroy_inline_file_buffer(inode);
+
+	ssdfs_super_cache_leaks_decrement(ii);
+	kmem_cache_free(ssdfs_inode_cachep, ii);
+}
+
+/*
+ * This method is called by destroy_inode() to release
+ * resources allocated for struct inode
+ */
+static void ssdfs_destroy_inode(struct inode *inode)
+{
+	call_rcu(&inode->i_rcu, ssdfs_i_callback);
+}
+
+static void ssdfs_init_inode_once(void *obj)
+{
+	struct ssdfs_inode_info *ii = obj;
+	inode_init_once(&ii->vfs_inode);
+}
+
+static int ssdfs_remount_fs(struct super_block *sb, int *flags, char *data)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct ssdfs_peb_extent last_sb_log = {0};
+	struct ssdfs_sb_log_payload payload;
+	unsigned long old_sb_flags;
+	unsigned long old_mount_opts;
+	int err;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("sb %p, flags %#x, data %p\n", sb, *flags, data);
+#else
+	SSDFS_DBG("sb %p, flags %#x, data %p\n", sb, *flags, data);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	old_sb_flags = sb->s_flags;
+	old_mount_opts = fsi->mount_opts;
+
+	pagevec_init(&payload.maptbl_cache.pvec);
+
+	err = ssdfs_parse_options(fsi, data);
+	if (err)
+		goto restore_opts;
+
+	set_posix_acl_flag(sb);
+
+	if ((*flags & SB_RDONLY) == (sb->s_flags & SB_RDONLY))
+		goto out;
+
+	if (*flags & SB_RDONLY) {
+		down_write(&fsi->volume_sem);
+
+		err = ssdfs_prepare_sb_log(sb, &last_sb_log);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to prepare sb log: err %d\n",
+				  err);
+		}
+
+		err = ssdfs_snapshot_sb_log_payload(sb, &payload);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to snapshot sb log's payload: err %d\n",
+				  err);
+		}
+
+		if (!err) {
+			err = ssdfs_commit_super(sb, SSDFS_VALID_FS,
+						 &last_sb_log,
+						 &payload);
+		} else {
+			SSDFS_ERR("fail to prepare sb log payload: "
+				  "err %d\n", err);
+		}
+
+		up_write(&fsi->volume_sem);
+
+		if (err)
+			SSDFS_ERR("fail to commit superblock info\n");
+
+		sb->s_flags |= SB_RDONLY;
+		SSDFS_DBG("remount in RO mode\n");
+	} else {
+		down_write(&fsi->volume_sem);
+
+		err = ssdfs_prepare_sb_log(sb, &last_sb_log);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to prepare sb log: err %d\n",
+				  err);
+		}
+
+		err = ssdfs_snapshot_sb_log_payload(sb, &payload);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to snapshot sb log's payload: err %d\n",
+				  err);
+		}
+
+		if (!err) {
+			err = ssdfs_commit_super(sb, SSDFS_MOUNTED_FS,
+						 &last_sb_log,
+						 &payload);
+		} else {
+			SSDFS_ERR("fail to prepare sb log payload: "
+				  "err %d\n", err);
+		}
+
+		up_write(&fsi->volume_sem);
+
+		if (err) {
+			SSDFS_NOTICE("fail to commit superblock info\n");
+			goto restore_opts;
+		}
+
+		sb->s_flags &= ~SB_RDONLY;
+		SSDFS_DBG("remount in RW mode\n");
+	}
+out:
+	ssdfs_super_pagevec_release(&payload.maptbl_cache.pvec);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("finished\n");
+#else
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	return 0;
+
+restore_opts:
+	sb->s_flags = old_sb_flags;
+	fsi->mount_opts = old_mount_opts;
+	ssdfs_super_pagevec_release(&payload.maptbl_cache.pvec);
+	return err;
+}
+
+static inline
+bool unfinished_user_data_requests_exist(struct ssdfs_fs_info *fsi)
+{
+	u64 flush_requests = 0;
+
+	spin_lock(&fsi->volume_state_lock);
+	flush_requests = fsi->flushing_user_data_requests;
+	spin_unlock(&fsi->volume_state_lock);
+
+	return flush_requests > 0;
+}
+
+static int ssdfs_sync_fs(struct super_block *sb, int wait)
+{
+	struct ssdfs_fs_info *fsi;
+	int err = 0;
+
+	fsi = SSDFS_FS_I(sb);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("sb %p\n", sb);
+#else
+	SSDFS_DBG("sb %p\n", sb);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+#ifdef CONFIG_SSDFS_SHOW_CONSUMED_MEMORY
+	SSDFS_ERR("SYNCFS is starting...\n");
+	ssdfs_check_memory_leaks();
+#endif /* CONFIG_SSDFS_SHOW_CONSUMED_MEMORY */
+
+	atomic_set(&fsi->global_fs_state, SSDFS_METADATA_GOING_FLUSHING);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("SSDFS_METADATA_GOING_FLUSHING\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	wake_up_all(&fsi->pending_wq);
+
+	if (unfinished_user_data_requests_exist(fsi)) {
+		wait_queue_head_t *wq = &fsi->finish_user_data_flush_wq;
+
+		err = wait_event_killable_timeout(*wq,
+				!unfinished_user_data_requests_exist(fsi),
+				SSDFS_DEFAULT_TIMEOUT);
+		if (err < 0)
+			WARN_ON(err < 0);
+		else
+			err = 0;
+
+		if (unfinished_user_data_requests_exist(fsi))
+			BUG();
+	}
+
+	atomic_set(&fsi->global_fs_state, SSDFS_METADATA_UNDER_FLUSH);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("SSDFS_METADATA_UNDER_FLUSH\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	down_write(&fsi->volume_sem);
+
+	if (fsi->fs_feature_compat &
+			SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG) {
+		err = ssdfs_invextree_flush(fsi);
+		if (err) {
+			SSDFS_ERR("fail to flush invalidated extents btree: "
+				  "err %d\n", err);
+		}
+	}
+
+	if (fsi->fs_feature_compat & SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG) {
+		err = ssdfs_shextree_flush(fsi);
+		if (err) {
+			SSDFS_ERR("fail to flush shared extents btree: "
+				  "err %d\n", err);
+		}
+	}
+
+	if (fsi->fs_feature_compat & SSDFS_HAS_INODES_TREE_COMPAT_FLAG) {
+		err = ssdfs_inodes_btree_flush(fsi->inodes_tree);
+		if (err) {
+			SSDFS_ERR("fail to flush inodes btree: "
+				  "err %d\n", err);
+		}
+	}
+
+	if (fsi->fs_feature_compat & SSDFS_HAS_SHARED_DICT_COMPAT_FLAG) {
+		err = ssdfs_shared_dict_btree_flush(fsi->shdictree);
+		if (err) {
+			SSDFS_ERR("fail to flush shared dictionary: "
+				  "err %d\n", err);
+		}
+	}
+
+	err = ssdfs_execute_create_snapshots(fsi);
+	if (err) {
+		SSDFS_ERR("fail to process the snapshots creation\n");
+	}
+
+	if (fsi->fs_feature_compat & SSDFS_HAS_SNAPSHOTS_TREE_COMPAT_FLAG) {
+		err = ssdfs_snapshots_btree_flush(fsi);
+		if (err) {
+			SSDFS_ERR("fail to flush snapshots btree: "
+				  "err %d\n", err);
+		}
+	}
+
+	if (fsi->fs_feature_compat & SSDFS_HAS_SEGBMAP_COMPAT_FLAG) {
+		err = ssdfs_segbmap_flush(fsi->segbmap);
+		if (err) {
+			SSDFS_ERR("fail to flush segment bitmap: "
+				  "err %d\n", err);
+		}
+	}
+
+	if (fsi->fs_feature_compat & SSDFS_HAS_MAPTBL_COMPAT_FLAG) {
+		err = ssdfs_maptbl_flush(fsi->maptbl);
+		if (err) {
+			SSDFS_ERR("fail to flush mapping table: "
+				  "err %d\n", err);
+		}
+	}
+
+	up_write(&fsi->volume_sem);
+
+	atomic_set(&fsi->global_fs_state, SSDFS_REGULAR_FS_OPERATIONS);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("SSDFS_REGULAR_FS_OPERATIONS\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_SHOW_CONSUMED_MEMORY
+	SSDFS_ERR("SYNCFS has been finished...\n");
+	ssdfs_check_memory_leaks();
+#endif /* CONFIG_SSDFS_SHOW_CONSUMED_MEMORY */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("finished\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (unlikely(err))
+		goto fail_sync_fs;
+
+	trace_ssdfs_sync_fs(sb, wait);
+
+	return 0;
+
+fail_sync_fs:
+	trace_ssdfs_sync_fs_exit(sb, wait, err);
+	return err;
+}
+
+static struct inode *ssdfs_nfs_get_inode(struct super_block *sb,
+					 u64 ino, u32 generation)
+{
+	struct inode *inode;
+
+	if (ino < SSDFS_ROOT_INO)
+		return ERR_PTR(-ESTALE);
+
+	inode = ssdfs_iget(sb, ino);
+	if (IS_ERR(inode))
+		return ERR_CAST(inode);
+	if (generation && inode->i_generation != generation) {
+		iput(inode);
+		return ERR_PTR(-ESTALE);
+	}
+	return inode;
+}
+
+static struct dentry *ssdfs_fh_to_dentry(struct super_block *sb,
+					 struct fid *fid,
+					 int fh_len, int fh_type)
+{
+	return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
+				    ssdfs_nfs_get_inode);
+}
+
+static struct dentry *ssdfs_fh_to_parent(struct super_block *sb,
+					 struct fid *fid,
+					 int fh_len, int fh_type)
+{
+	return generic_fh_to_parent(sb, fid, fh_len, fh_type,
+				    ssdfs_nfs_get_inode);
+}
+
+static struct dentry *ssdfs_get_parent(struct dentry *child)
+{
+	struct qstr dotdot = QSTR_INIT("..", 2);
+	ino_t ino;
+	int err;
+
+	err = ssdfs_inode_by_name(d_inode(child), &dotdot, &ino);
+	if (unlikely(err))
+		return ERR_PTR(err);
+
+	return d_obtain_alias(ssdfs_iget(child->d_sb, ino));
+}
+
+static const struct export_operations ssdfs_export_ops = {
+	.get_parent	= ssdfs_get_parent,
+	.fh_to_dentry	= ssdfs_fh_to_dentry,
+	.fh_to_parent	= ssdfs_fh_to_parent,
+};
+
+static const struct super_operations ssdfs_super_operations = {
+	.alloc_inode	= ssdfs_alloc_inode,
+	.destroy_inode	= ssdfs_destroy_inode,
+	.evict_inode	= ssdfs_evict_inode,
+	.write_inode	= ssdfs_write_inode,
+	.statfs		= ssdfs_statfs,
+	.show_options	= ssdfs_show_options,
+	.put_super	= ssdfs_put_super,
+	.remount_fs	= ssdfs_remount_fs,
+	.sync_fs	= ssdfs_sync_fs,
+};
+
+static void ssdfs_memory_page_locks_checker_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_locked_pages, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static void ssdfs_check_memory_page_locks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_locked_pages) != 0) {
+		SSDFS_WARN("Lock keeps %lld memory pages\n",
+			   atomic64_read(&ssdfs_locked_pages));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static void ssdfs_memory_leaks_checker_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_allocated_pages, 0);
+	atomic64_set(&ssdfs_memory_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+#ifdef CONFIG_SSDFS_POSIX_ACL
+	ssdfs_acl_memory_leaks_init();
+#endif /* CONFIG_SSDFS_POSIX_ACL */
+
+	ssdfs_block_bmap_memory_leaks_init();
+	ssdfs_btree_memory_leaks_init();
+	ssdfs_btree_hierarchy_memory_leaks_init();
+	ssdfs_btree_node_memory_leaks_init();
+	ssdfs_btree_search_memory_leaks_init();
+
+#ifdef CONFIG_SSDFS_ZLIB
+	ssdfs_zlib_memory_leaks_init();
+#endif /* CONFIG_SSDFS_ZLIB */
+
+#ifdef CONFIG_SSDFS_LZO
+	ssdfs_lzo_memory_leaks_init();
+#endif /* CONFIG_SSDFS_LZO */
+
+	ssdfs_compr_memory_leaks_init();
+	ssdfs_cur_seg_memory_leaks_init();
+	ssdfs_dentries_memory_leaks_init();
+
+#ifdef CONFIG_SSDFS_MTD_DEVICE
+	ssdfs_dev_mtd_memory_leaks_init();
+#elif defined(CONFIG_SSDFS_BLOCK_DEVICE)
+	ssdfs_dev_bdev_memory_leaks_init();
+	ssdfs_dev_zns_memory_leaks_init();
+#else
+	BUILD_BUG();
+#endif
+
+	ssdfs_dir_memory_leaks_init();
+
+#ifdef CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA
+	ssdfs_diff_memory_leaks_init();
+#endif /* CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA */
+
+	ssdfs_ext_queue_memory_leaks_init();
+	ssdfs_ext_tree_memory_leaks_init();
+	ssdfs_file_memory_leaks_init();
+	ssdfs_fs_error_memory_leaks_init();
+	ssdfs_inode_memory_leaks_init();
+	ssdfs_ino_tree_memory_leaks_init();
+	ssdfs_invext_tree_memory_leaks_init();
+	ssdfs_blk2off_memory_leaks_init();
+	ssdfs_parray_memory_leaks_init();
+	ssdfs_page_vector_memory_leaks_init();
+	ssdfs_flush_memory_leaks_init();
+	ssdfs_gc_memory_leaks_init();
+	ssdfs_map_queue_memory_leaks_init();
+	ssdfs_map_tbl_memory_leaks_init();
+	ssdfs_map_cache_memory_leaks_init();
+	ssdfs_map_thread_memory_leaks_init();
+	ssdfs_migration_memory_leaks_init();
+	ssdfs_peb_memory_leaks_init();
+	ssdfs_read_memory_leaks_init();
+	ssdfs_recovery_memory_leaks_init();
+	ssdfs_req_queue_memory_leaks_init();
+	ssdfs_seg_obj_memory_leaks_init();
+	ssdfs_seg_bmap_memory_leaks_init();
+	ssdfs_seg_blk_memory_leaks_init();
+	ssdfs_seg_tree_memory_leaks_init();
+	ssdfs_seq_arr_memory_leaks_init();
+	ssdfs_dict_memory_leaks_init();
+	ssdfs_shextree_memory_leaks_init();
+	ssdfs_super_memory_leaks_init();
+	ssdfs_xattr_memory_leaks_init();
+	ssdfs_snap_reqs_queue_memory_leaks_init();
+	ssdfs_snap_rules_list_memory_leaks_init();
+	ssdfs_snap_tree_memory_leaks_init();
+}
+
+static void ssdfs_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_POSIX_ACL
+	ssdfs_acl_check_memory_leaks();
+#endif /* CONFIG_SSDFS_POSIX_ACL */
+
+	ssdfs_block_bmap_check_memory_leaks();
+	ssdfs_btree_check_memory_leaks();
+	ssdfs_btree_hierarchy_check_memory_leaks();
+	ssdfs_btree_node_check_memory_leaks();
+	ssdfs_btree_search_check_memory_leaks();
+
+#ifdef CONFIG_SSDFS_ZLIB
+	ssdfs_zlib_check_memory_leaks();
+#endif /* CONFIG_SSDFS_ZLIB */
+
+#ifdef CONFIG_SSDFS_LZO
+	ssdfs_lzo_check_memory_leaks();
+#endif /* CONFIG_SSDFS_LZO */
+
+	ssdfs_compr_check_memory_leaks();
+	ssdfs_cur_seg_check_memory_leaks();
+	ssdfs_dentries_check_memory_leaks();
+
+#ifdef CONFIG_SSDFS_MTD_DEVICE
+	ssdfs_dev_mtd_check_memory_leaks();
+#elif defined(CONFIG_SSDFS_BLOCK_DEVICE)
+	ssdfs_dev_bdev_check_memory_leaks();
+	ssdfs_dev_zns_check_memory_leaks();
+#else
+	BUILD_BUG();
+#endif
+
+	ssdfs_dir_check_memory_leaks();
+
+#ifdef CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA
+	ssdfs_diff_check_memory_leaks();
+#endif /* CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA */
+
+	ssdfs_ext_queue_check_memory_leaks();
+	ssdfs_ext_tree_check_memory_leaks();
+	ssdfs_file_check_memory_leaks();
+	ssdfs_fs_error_check_memory_leaks();
+	ssdfs_inode_check_memory_leaks();
+	ssdfs_ino_tree_check_memory_leaks();
+	ssdfs_invext_tree_check_memory_leaks();
+	ssdfs_blk2off_check_memory_leaks();
+	ssdfs_parray_check_memory_leaks();
+	ssdfs_page_vector_check_memory_leaks();
+	ssdfs_flush_check_memory_leaks();
+	ssdfs_gc_check_memory_leaks();
+	ssdfs_map_queue_check_memory_leaks();
+	ssdfs_map_tbl_check_memory_leaks();
+	ssdfs_map_cache_check_memory_leaks();
+	ssdfs_map_thread_check_memory_leaks();
+	ssdfs_migration_check_memory_leaks();
+	ssdfs_peb_check_memory_leaks();
+	ssdfs_read_check_memory_leaks();
+	ssdfs_recovery_check_memory_leaks();
+	ssdfs_req_queue_check_memory_leaks();
+	ssdfs_seg_obj_check_memory_leaks();
+	ssdfs_seg_bmap_check_memory_leaks();
+	ssdfs_seg_blk_check_memory_leaks();
+	ssdfs_seg_tree_check_memory_leaks();
+	ssdfs_seq_arr_check_memory_leaks();
+	ssdfs_dict_check_memory_leaks();
+	ssdfs_shextree_check_memory_leaks();
+	ssdfs_super_check_memory_leaks();
+	ssdfs_xattr_check_memory_leaks();
+	ssdfs_snap_reqs_queue_check_memory_leaks();
+	ssdfs_snap_rules_list_check_memory_leaks();
+	ssdfs_snap_tree_check_memory_leaks();
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+#ifdef CONFIG_SSDFS_SHOW_CONSUMED_MEMORY
+	if (atomic64_read(&ssdfs_allocated_pages) != 0) {
+		SSDFS_ERR("Memory leaks include %lld pages\n",
+			  atomic64_read(&ssdfs_allocated_pages));
+	}
+
+	if (atomic64_read(&ssdfs_memory_leaks) != 0) {
+		SSDFS_ERR("Memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_memory_leaks));
+	}
+#else
+	if (atomic64_read(&ssdfs_allocated_pages) != 0) {
+		SSDFS_WARN("Memory leaks include %lld pages\n",
+			   atomic64_read(&ssdfs_allocated_pages));
+	}
+
+	if (atomic64_read(&ssdfs_memory_leaks) != 0) {
+		SSDFS_WARN("Memory allocator suffers from %lld leaks\n",
+			   atomic64_read(&ssdfs_memory_leaks));
+	}
+#endif /* CONFIG_SSDFS_SHOW_CONSUMED_MEMORY */
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static int ssdfs_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct ssdfs_fs_info *fs_info;
+	struct ssdfs_peb_extent last_sb_log = {0};
+	struct ssdfs_sb_log_payload payload;
+	struct inode *root_i;
+	u64 fs_feature_compat;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("sb %p, data %p, silent %#x\n", sb, data, silent);
+#else
+	SSDFS_DBG("sb %p, data %p, silent %#x\n", sb, data, silent);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("segment header size %zu, "
+		  "partial log header size %zu, "
+		  "footer size %zu\n",
+		  sizeof(struct ssdfs_segment_header),
+		  sizeof(struct ssdfs_partial_log_header),
+		  sizeof(struct ssdfs_log_footer));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_memory_page_locks_checker_init();
+	ssdfs_memory_leaks_checker_init();
+
+	fs_info = ssdfs_super_kzalloc(sizeof(*fs_info), GFP_KERNEL);
+	if (!fs_info)
+		return -ENOMEM;
+
+#ifdef CONFIG_SSDFS_TESTING
+	fs_info->do_fork_invalidation = true;
+#endif /* CONFIG_SSDFS_TESTING */
+
+	fs_info->max_open_zones = 0;
+	fs_info->is_zns_device = false;
+	fs_info->zone_size = U64_MAX;
+	fs_info->zone_capacity = U64_MAX;
+	atomic_set(&fs_info->open_zones, 0);
+
+#ifdef CONFIG_SSDFS_MTD_DEVICE
+	fs_info->mtd = sb->s_mtd;
+	fs_info->devops = &ssdfs_mtd_devops;
+	sb->s_bdi = sb->s_mtd->backing_dev_info;
+#elif defined(CONFIG_SSDFS_BLOCK_DEVICE)
+	if (bdev_is_zoned(sb->s_bdev)) {
+		fs_info->devops = &ssdfs_zns_devops;
+		fs_info->is_zns_device = true;
+		fs_info->max_open_zones = bdev_max_open_zones(sb->s_bdev);
+
+		fs_info->zone_size = ssdfs_zns_zone_size(sb,
+						SSDFS_RESERVED_VBR_SIZE);
+		if (fs_info->zone_size >= U64_MAX) {
+			SSDFS_ERR("fail to get zone size\n");
+			return -ERANGE;
+		}
+
+		fs_info->zone_capacity = ssdfs_zns_zone_capacity(sb,
+						SSDFS_RESERVED_VBR_SIZE);
+		if (fs_info->zone_capacity >= U64_MAX) {
+			SSDFS_ERR("fail to get zone capacity\n");
+			return -ERANGE;
+		} else if (fs_info->zone_capacity > fs_info->zone_size) {
+			SSDFS_ERR("invalid zone capacity: "
+				  "capacity %llu, size %llu\n",
+				  fs_info->zone_capacity,
+				  fs_info->zone_size);
+			return -ERANGE;
+		}
+	} else
+		fs_info->devops = &ssdfs_bdev_devops;
+
+	sb->s_bdi = bdi_get(sb->s_bdev->bd_disk->bdi);
+	atomic_set(&fs_info->pending_bios, 0);
+	fs_info->erase_page = ssdfs_super_alloc_page(GFP_KERNEL);
+	if (IS_ERR_OR_NULL(fs_info->erase_page)) {
+		err = (fs_info->erase_page == NULL ?
+				-ENOMEM : PTR_ERR(fs_info->erase_page));
+		SSDFS_ERR("unable to allocate memory page\n");
+		goto free_erase_page;
+	}
+	memset(page_address(fs_info->erase_page), 0xFF, PAGE_SIZE);
+#else
+	BUILD_BUG();
+#endif
+
+	fs_info->sb = sb;
+	sb->s_fs_info = fs_info;
+	atomic64_set(&fs_info->flush_reqs, 0);
+	init_waitqueue_head(&fs_info->pending_wq);
+	init_waitqueue_head(&fs_info->finish_user_data_flush_wq);
+	atomic_set(&fs_info->global_fs_state, SSDFS_UNKNOWN_GLOBAL_FS_STATE);
+
+	for (i = 0; i < SSDFS_GC_THREAD_TYPE_MAX; i++) {
+		init_waitqueue_head(&fs_info->gc_wait_queue[i]);
+		atomic_set(&fs_info->gc_should_act[i], 1);
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("parse options started...\n");
+#else
+	SSDFS_DBG("parse options started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	err = ssdfs_parse_options(fs_info, data);
+	if (err)
+		goto free_erase_page;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("gather superblock info started...\n");
+#else
+	SSDFS_DBG("gather superblock info started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	err = ssdfs_gather_superblock_info(fs_info, silent);
+	if (err)
+		goto free_erase_page;
+
+	spin_lock(&fs_info->volume_state_lock);
+	fs_feature_compat = fs_info->fs_feature_compat;
+	spin_unlock(&fs_info->volume_state_lock);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create device group started...\n");
+#else
+	SSDFS_DBG("create device group started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	err = ssdfs_sysfs_create_device_group(sb);
+	if (err)
+		goto release_maptbl_cache;
+
+	sb->s_maxbytes = MAX_LFS_FILESIZE;
+	sb->s_magic = SSDFS_SUPER_MAGIC;
+	sb->s_op = &ssdfs_super_operations;
+	sb->s_export_op = &ssdfs_export_ops;
+
+	sb->s_xattr = ssdfs_xattr_handlers;
+	set_posix_acl_flag(sb);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create snapshots subsystem started...\n");
+#else
+	SSDFS_DBG("create snapshots subsystem started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	err = ssdfs_snapshot_subsystem_init(fs_info);
+	if (err == -EINTR) {
+		/*
+		 * Ignore this error.
+		 */
+		err = 0;
+		goto destroy_sysfs_device_group;
+	} else if (err)
+		goto destroy_sysfs_device_group;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create segment tree started...\n");
+#else
+	SSDFS_DBG("create segment tree started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	down_write(&fs_info->volume_sem);
+	err = ssdfs_segment_tree_create(fs_info);
+	up_write(&fs_info->volume_sem);
+	if (err)
+		goto destroy_snapshot_subsystem;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create mapping table started...\n");
+#else
+	SSDFS_DBG("create mapping table started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (fs_feature_compat & SSDFS_HAS_MAPTBL_COMPAT_FLAG) {
+		down_write(&fs_info->volume_sem);
+		err = ssdfs_maptbl_create(fs_info);
+		up_write(&fs_info->volume_sem);
+
+		if (err == -EINTR) {
+			/*
+			 * Ignore this error.
+			 */
+			err = 0;
+			goto destroy_segments_tree;
+		} else if (err)
+			goto destroy_segments_tree;
+	} else {
+		err = -EIO;
+		SSDFS_WARN("volume hasn't mapping table\n");
+		goto destroy_segments_tree;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create segment bitmap started...\n");
+#else
+	SSDFS_DBG("create segment bitmap started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (fs_feature_compat & SSDFS_HAS_SEGBMAP_COMPAT_FLAG) {
+		down_write(&fs_info->volume_sem);
+		err = ssdfs_segbmap_create(fs_info);
+		up_write(&fs_info->volume_sem);
+
+		if (err == -EINTR) {
+			/*
+			 * Ignore this error.
+			 */
+			err = 0;
+			goto destroy_maptbl;
+		} else if (err)
+			goto destroy_maptbl;
+	} else {
+		err = -EIO;
+		SSDFS_WARN("volume hasn't segment bitmap\n");
+		goto destroy_maptbl;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create shared extents tree started...\n");
+#else
+	SSDFS_DBG("create shared extents tree started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (fs_info->fs_feature_compat & SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG) {
+		down_write(&fs_info->volume_sem);
+		err = ssdfs_shextree_create(fs_info);
+		up_write(&fs_info->volume_sem);
+		if (err)
+			goto destroy_segbmap;
+	} else {
+		err = -EIO;
+		SSDFS_WARN("volume hasn't shared extents tree\n");
+		goto destroy_segbmap;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create invalidated extents btree started...\n");
+#else
+	SSDFS_DBG("create invalidated extents btree started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (fs_feature_compat & SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG) {
+		down_write(&fs_info->volume_sem);
+		err = ssdfs_invextree_create(fs_info);
+		up_write(&fs_info->volume_sem);
+		if (err)
+			goto destroy_shextree;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create current segment array started...\n");
+#else
+	SSDFS_DBG("create current segment array started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	down_write(&fs_info->volume_sem);
+	err = ssdfs_current_segment_array_create(fs_info);
+	up_write(&fs_info->volume_sem);
+	if (err)
+		goto destroy_invext_btree;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create shared dictionary started...\n");
+#else
+	SSDFS_DBG("create shared dictionary started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (fs_feature_compat & SSDFS_HAS_SHARED_DICT_COMPAT_FLAG) {
+		down_write(&fs_info->volume_sem);
+
+		err = ssdfs_shared_dict_btree_create(fs_info);
+		if (err) {
+			up_write(&fs_info->volume_sem);
+			goto destroy_current_segment_array;
+		}
+
+		err = ssdfs_shared_dict_btree_init(fs_info);
+		if (err) {
+			up_write(&fs_info->volume_sem);
+			goto destroy_shdictree;
+		}
+
+		up_write(&fs_info->volume_sem);
+	} else {
+		err = -EIO;
+		SSDFS_WARN("volume hasn't shared dictionary\n");
+		goto destroy_current_segment_array;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create inodes btree started...\n");
+#else
+	SSDFS_DBG("create inodes btree started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (fs_feature_compat & SSDFS_HAS_INODES_TREE_COMPAT_FLAG) {
+		down_write(&fs_info->volume_sem);
+		err = ssdfs_inodes_btree_create(fs_info);
+		up_write(&fs_info->volume_sem);
+		if (err)
+			goto destroy_shdictree;
+	} else {
+		err = -EIO;
+		SSDFS_WARN("volume hasn't inodes btree\n");
+		goto destroy_shdictree;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("getting root inode...\n");
+#else
+	SSDFS_DBG("getting root inode...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	root_i = ssdfs_iget(sb, SSDFS_ROOT_INO);
+	if (IS_ERR(root_i)) {
+		SSDFS_DBG("getting root inode failed\n");
+		err = PTR_ERR(root_i);
+		goto destroy_inodes_btree;
+	}
+
+	if (!S_ISDIR(root_i->i_mode) || !root_i->i_blocks || !root_i->i_size) {
+		err = -ERANGE;
+		iput(root_i);
+		SSDFS_ERR("corrupted root inode\n");
+		goto destroy_inodes_btree;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("d_make_root()\n");
+#else
+	SSDFS_DBG("d_make_root()\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	sb->s_root = d_make_root(root_i);
+	if (!sb->s_root) {
+		err = -ENOMEM;
+		goto put_root_inode;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("starting GC threads...\n");
+#else
+	SSDFS_DBG("starting GC threads...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	err = ssdfs_start_gc_thread(fs_info, SSDFS_SEG_USING_GC_THREAD);
+	if (err == -EINTR) {
+		/*
+		 * Ignore this error.
+		 */
+		err = 0;
+		goto put_root_inode;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to start GC-using-seg thread: "
+			  "err %d\n", err);
+		goto put_root_inode;
+	}
+
+	err = ssdfs_start_gc_thread(fs_info, SSDFS_SEG_USED_GC_THREAD);
+	if (err == -EINTR) {
+		/*
+		 * Ignore this error.
+		 */
+		err = 0;
+		goto stop_gc_using_seg_thread;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to start GC-used-seg thread: "
+			  "err %d\n", err);
+		goto stop_gc_using_seg_thread;
+	}
+
+	err = ssdfs_start_gc_thread(fs_info, SSDFS_SEG_PRE_DIRTY_GC_THREAD);
+	if (err == -EINTR) {
+		/*
+		 * Ignore this error.
+		 */
+		err = 0;
+		goto stop_gc_used_seg_thread;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to start GC-pre-dirty-seg thread: "
+			  "err %d\n", err);
+		goto stop_gc_used_seg_thread;
+	}
+
+	err = ssdfs_start_gc_thread(fs_info, SSDFS_SEG_DIRTY_GC_THREAD);
+	if (err == -EINTR) {
+		/*
+		 * Ignore this error.
+		 */
+		err = 0;
+		goto stop_gc_pre_dirty_seg_thread;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to start GC-dirty-seg thread: "
+			  "err %d\n", err);
+		goto stop_gc_pre_dirty_seg_thread;
+	}
+
+	if (!(sb->s_flags & SB_RDONLY)) {
+		pagevec_init(&payload.maptbl_cache.pvec);
+
+		down_write(&fs_info->volume_sem);
+
+		err = ssdfs_prepare_sb_log(sb, &last_sb_log);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to prepare sb log: err %d\n",
+				  err);
+		}
+
+		err = ssdfs_snapshot_sb_log_payload(sb, &payload);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to snapshot sb log's payload: err %d\n",
+				  err);
+		}
+
+		if (!err) {
+			err = ssdfs_commit_super(sb, SSDFS_MOUNTED_FS,
+						 &last_sb_log,
+						 &payload);
+		} else {
+			SSDFS_ERR("fail to prepare sb log payload: "
+				  "err %d\n", err);
+		}
+
+		up_write(&fs_info->volume_sem);
+
+		ssdfs_super_pagevec_release(&payload.maptbl_cache.pvec);
+
+		if (err) {
+			SSDFS_NOTICE("fail to commit superblock info: "
+				     "remount filesystem in RO mode\n");
+			sb->s_flags |= SB_RDONLY;
+		}
+	}
+
+	atomic_set(&fs_info->global_fs_state, SSDFS_REGULAR_FS_OPERATIONS);
+
+	SSDFS_INFO("%s has been mounted on device %s\n",
+		   SSDFS_VERSION, fs_info->devops->device_name(sb));
+
+	return 0;
+
+stop_gc_pre_dirty_seg_thread:
+	ssdfs_stop_gc_thread(fs_info, SSDFS_SEG_PRE_DIRTY_GC_THREAD);
+
+stop_gc_used_seg_thread:
+	ssdfs_stop_gc_thread(fs_info, SSDFS_SEG_USED_GC_THREAD);
+
+stop_gc_using_seg_thread:
+	ssdfs_stop_gc_thread(fs_info, SSDFS_SEG_USING_GC_THREAD);
+
+put_root_inode:
+	iput(root_i);
+
+destroy_inodes_btree:
+	ssdfs_inodes_btree_destroy(fs_info);
+
+destroy_shdictree:
+	ssdfs_shared_dict_btree_destroy(fs_info);
+
+destroy_current_segment_array:
+	ssdfs_destroy_all_curent_segments(fs_info);
+
+destroy_invext_btree:
+	ssdfs_invextree_destroy(fs_info);
+
+destroy_shextree:
+	ssdfs_shextree_destroy(fs_info);
+
+destroy_segbmap:
+	ssdfs_segbmap_destroy(fs_info);
+
+destroy_maptbl:
+	ssdfs_maptbl_stop_thread(fs_info->maptbl);
+	ssdfs_maptbl_destroy(fs_info);
+
+destroy_segments_tree:
+	ssdfs_segment_tree_destroy(fs_info);
+	ssdfs_current_segment_array_destroy(fs_info);
+
+destroy_snapshot_subsystem:
+	ssdfs_snapshot_subsystem_destroy(fs_info);
+
+destroy_sysfs_device_group:
+	ssdfs_sysfs_delete_device_group(fs_info);
+
+release_maptbl_cache:
+	ssdfs_maptbl_cache_destroy(&fs_info->maptbl_cache);
+
+free_erase_page:
+	if (fs_info->erase_page)
+		ssdfs_super_free_page(fs_info->erase_page);
+
+	ssdfs_destruct_sb_info(&fs_info->sbi);
+	ssdfs_destruct_sb_info(&fs_info->sbi_backup);
+
+	ssdfs_free_workspaces();
+
+	ssdfs_super_kfree(fs_info);
+
+	rcu_barrier();
+
+	ssdfs_check_memory_page_locks();
+	ssdfs_check_memory_leaks();
+	return err;
+}
+
+static void ssdfs_put_super(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct ssdfs_peb_extent last_sb_log = {0};
+	struct ssdfs_sb_log_payload payload;
+	u64 fs_feature_compat;
+	u16 fs_state;
+	bool can_commit_super = true;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("sb %p\n", sb);
+#else
+	SSDFS_DBG("sb %p\n", sb);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	atomic_set(&fsi->global_fs_state, SSDFS_METADATA_GOING_FLUSHING);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("SSDFS_METADATA_GOING_FLUSHING\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	wake_up_all(&fsi->pending_wq);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("STOP THREADS...\n");
+#else
+	SSDFS_DBG("STOP THREADS...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	err = ssdfs_stop_gc_thread(fsi, SSDFS_SEG_USING_GC_THREAD);
+	if (err) {
+		SSDFS_ERR("fail to stop GC using seg thread: "
+			  "err %d\n", err);
+	}
+
+	err = ssdfs_stop_gc_thread(fsi, SSDFS_SEG_USED_GC_THREAD);
+	if (err) {
+		SSDFS_ERR("fail to stop GC used seg thread: "
+			  "err %d\n", err);
+	}
+
+	err = ssdfs_stop_gc_thread(fsi, SSDFS_SEG_PRE_DIRTY_GC_THREAD);
+	if (err) {
+		SSDFS_ERR("fail to stop GC pre-dirty seg thread: "
+			  "err %d\n", err);
+	}
+
+	err = ssdfs_stop_gc_thread(fsi, SSDFS_SEG_DIRTY_GC_THREAD);
+	if (err) {
+		SSDFS_ERR("fail to stop GC dirty seg thread: "
+			  "err %d\n", err);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("GC threads have been stoped\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_shared_dict_stop_thread(fsi->shdictree);
+	if (err == -EIO) {
+		ssdfs_fs_error(fsi->sb,
+				__FILE__, __func__, __LINE__,
+				"thread I/O issue\n");
+	} else if (unlikely(err)) {
+		SSDFS_WARN("thread stopping issue: err %d\n",
+			   err);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("shared dictionary thread has been stoped\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < SSDFS_INVALIDATION_QUEUE_NUMBER; i++) {
+		err = ssdfs_shextree_stop_thread(fsi->shextree, i);
+		if (err == -EIO) {
+			ssdfs_fs_error(fsi->sb,
+					__FILE__, __func__, __LINE__,
+					"thread I/O issue\n");
+		} else if (unlikely(err)) {
+			SSDFS_WARN("thread stopping issue: ID %d, err %d\n",
+				   i, err);
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("shared extents threads have been stoped\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_stop_snapshots_btree_thread(fsi);
+	if (err == -EIO) {
+		ssdfs_fs_error(fsi->sb,
+				__FILE__, __func__, __LINE__,
+				"thread I/O issue\n");
+	} else if (unlikely(err)) {
+		SSDFS_WARN("thread stopping issue: err %d\n",
+			   err);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("snaphots btree thread has been stoped\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_maptbl_stop_thread(fsi->maptbl);
+	if (unlikely(err)) {
+		SSDFS_WARN("maptbl thread stopping issue: err %d\n",
+			   err);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("mapping table thread has been stoped\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	spin_lock(&fsi->volume_state_lock);
+	fs_feature_compat = fsi->fs_feature_compat;
+	fs_state = fsi->fs_state;
+	spin_unlock(&fsi->volume_state_lock);
+
+	pagevec_init(&payload.maptbl_cache.pvec);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("Wait unfinished user data requests...\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (unfinished_user_data_requests_exist(fsi)) {
+		wait_queue_head_t *wq = &fsi->finish_user_data_flush_wq;
+
+		err = wait_event_killable_timeout(*wq,
+				!unfinished_user_data_requests_exist(fsi),
+				SSDFS_DEFAULT_TIMEOUT);
+		if (err < 0)
+			WARN_ON(err < 0);
+		else
+			err = 0;
+
+		if (unfinished_user_data_requests_exist(fsi))
+			BUG();
+	}
+
+	atomic_set(&fsi->global_fs_state, SSDFS_METADATA_UNDER_FLUSH);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("SSDFS_METADATA_UNDER_FLUSH\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!(sb->s_flags & SB_RDONLY)) {
+		down_write(&fsi->volume_sem);
+
+		err = ssdfs_prepare_sb_log(sb, &last_sb_log);
+		if (unlikely(err)) {
+			can_commit_super = false;
+			SSDFS_ERR("fail to prepare sb log: err %d\n",
+				  err);
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush invalidated extents b-tree...\n");
+#else
+		SSDFS_DBG("Flush invalidated extents b-tree...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (fsi->fs_feature_compat &
+				SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG) {
+			err = ssdfs_invextree_flush(fsi);
+			if (err) {
+				SSDFS_ERR("fail to flush invalidated extents btree: "
+					  "err %d\n", err);
+			}
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush shared extents b-tree...\n");
+#else
+		SSDFS_DBG("Flush shared extents b-tree...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (fsi->fs_feature_compat &
+				SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG) {
+			err = ssdfs_shextree_flush(fsi);
+			if (err) {
+				SSDFS_ERR("fail to flush shared extents btree: "
+					  "err %d\n", err);
+			}
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush inodes b-tree...\n");
+#else
+		SSDFS_DBG("Flush inodes b-tree...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (fs_feature_compat & SSDFS_HAS_INODES_TREE_COMPAT_FLAG) {
+			err = ssdfs_inodes_btree_flush(fsi->inodes_tree);
+			if (err) {
+				SSDFS_ERR("fail to flush inodes btree: "
+					  "err %d\n", err);
+			}
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush shared dictionary b-tree...\n");
+#else
+		SSDFS_DBG("Flush shared dictionary b-tree...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (fs_feature_compat & SSDFS_HAS_SHARED_DICT_COMPAT_FLAG) {
+			err = ssdfs_shared_dict_btree_flush(fsi->shdictree);
+			if (err) {
+				SSDFS_ERR("fail to flush shared dictionary: "
+					  "err %d\n", err);
+			}
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Execute create snapshots...\n");
+#else
+		SSDFS_DBG("Execute create snapshots...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		err = ssdfs_execute_create_snapshots(fsi);
+		if (err) {
+			SSDFS_ERR("fail to process the snapshots creation\n");
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush snapshots b-tree...\n");
+#else
+		SSDFS_DBG("Flush snapshots b-tree...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (fsi->fs_feature_compat &
+				SSDFS_HAS_SNAPSHOTS_TREE_COMPAT_FLAG) {
+			err = ssdfs_snapshots_btree_flush(fsi);
+			if (err) {
+				SSDFS_ERR("fail to flush snapshots btree: "
+					  "err %d\n", err);
+			}
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush segment bitmap...\n");
+#else
+		SSDFS_DBG("Flush segment bitmap...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (fs_feature_compat & SSDFS_HAS_SEGBMAP_COMPAT_FLAG) {
+			err = ssdfs_segbmap_flush(fsi->segbmap);
+			if (err) {
+				SSDFS_ERR("fail to flush segbmap: "
+					  "err %d\n", err);
+			}
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush PEB mapping table...\n");
+#else
+		SSDFS_DBG("Flush PEB mapping table...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (fs_feature_compat & SSDFS_HAS_MAPTBL_COMPAT_FLAG) {
+			err = ssdfs_maptbl_flush(fsi->maptbl);
+			if (err) {
+				SSDFS_ERR("fail to flush maptbl: "
+					  "err %d\n", err);
+			}
+
+			set_maptbl_going_to_be_destroyed(fsi);
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Commit superblock...\n");
+#else
+		SSDFS_DBG("Commit superblock...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (can_commit_super) {
+			err = ssdfs_snapshot_sb_log_payload(sb, &payload);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to snapshot log's payload: "
+					  "err %d\n", err);
+			} else {
+				err = ssdfs_commit_super(sb, SSDFS_VALID_FS,
+							 &last_sb_log,
+							 &payload);
+			}
+		} else {
+			/* prepare error code */
+			err = -ERANGE;
+		}
+
+		if (err) {
+			SSDFS_ERR("fail to commit superblock info: "
+				  "err %d\n", err);
+		}
+
+		up_write(&fsi->volume_sem);
+	} else {
+		if (fs_state == SSDFS_ERROR_FS) {
+			down_write(&fsi->volume_sem);
+
+			err = ssdfs_prepare_sb_log(sb, &last_sb_log);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to prepare sb log: err %d\n",
+					  err);
+			}
+
+			err = ssdfs_snapshot_sb_log_payload(sb, &payload);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to snapshot log's payload: "
+					  "err %d\n", err);
+			}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+			SSDFS_ERR("Commit superblock...\n");
+#else
+			SSDFS_DBG("Commit superblock...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+			if (!err) {
+				err = ssdfs_commit_super(sb, SSDFS_ERROR_FS,
+							 &last_sb_log,
+							 &payload);
+			}
+
+			up_write(&fsi->volume_sem);
+
+			if (err) {
+				SSDFS_ERR("fail to commit superblock info: "
+					  "err %d\n", err);
+			}
+		}
+	}
+
+	atomic_set(&fsi->global_fs_state, SSDFS_UNKNOWN_GLOBAL_FS_STATE);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("SSDFS_UNKNOWN_GLOBAL_FS_STATE\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("Starting destroy the metadata structures...\n");
+#else
+	SSDFS_DBG("Starting destroy the metadata structures...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	ssdfs_super_pagevec_release(&payload.maptbl_cache.pvec);
+	fsi->devops->sync(sb);
+	ssdfs_snapshot_subsystem_destroy(fsi);
+	ssdfs_invextree_destroy(fsi);
+	ssdfs_shextree_destroy(fsi);
+	ssdfs_inodes_btree_destroy(fsi);
+	ssdfs_shared_dict_btree_destroy(fsi);
+	ssdfs_segbmap_destroy(fsi);
+	ssdfs_destroy_all_curent_segments(fsi);
+	ssdfs_segment_tree_destroy(fsi);
+	ssdfs_current_segment_array_destroy(fsi);
+	ssdfs_maptbl_destroy(fsi);
+	ssdfs_sysfs_delete_device_group(fsi);
+
+	SSDFS_INFO("%s has been unmounted from device %s\n",
+		   SSDFS_VERSION, fsi->devops->device_name(sb));
+
+	if (fsi->erase_page)
+		ssdfs_super_free_page(fsi->erase_page);
+
+	ssdfs_maptbl_cache_destroy(&fsi->maptbl_cache);
+	ssdfs_destruct_sb_info(&fsi->sbi);
+	ssdfs_destruct_sb_info(&fsi->sbi_backup);
+
+	ssdfs_free_workspaces();
+
+	ssdfs_super_kfree(fsi);
+	sb->s_fs_info = NULL;
+
+	rcu_barrier();
+
+	ssdfs_check_memory_page_locks();
+	ssdfs_check_memory_leaks();
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("All metadata structures have been destroyed...\n");
+#else
+	SSDFS_DBG("All metadata structures have been destroyed...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+}
+
+static struct dentry *ssdfs_mount(struct file_system_type *fs_type,
+				  int flags, const char *dev_name,
+				  void *data)
+{
+#ifdef CONFIG_SSDFS_MTD_DEVICE
+	return mount_mtd(fs_type, flags, dev_name, data, ssdfs_fill_super);
+#elif defined(CONFIG_SSDFS_BLOCK_DEVICE)
+	return mount_bdev(fs_type, flags, dev_name, data, ssdfs_fill_super);
+#else
+	BUILD_BUG();
+	return NULL;
+#endif
+}
+
+static void kill_ssdfs_sb(struct super_block *sb)
+{
+#ifdef CONFIG_SSDFS_MTD_DEVICE
+	kill_mtd_super(sb);
+#elif defined(CONFIG_SSDFS_BLOCK_DEVICE)
+	kill_block_super(sb);
+#else
+	BUILD_BUG();
+#endif
+}
+
+static struct file_system_type ssdfs_fs_type = {
+	.name		= "ssdfs",
+	.owner		= THIS_MODULE,
+	.mount		= ssdfs_mount,
+	.kill_sb	= kill_ssdfs_sb,
+#ifdef CONFIG_SSDFS_BLOCK_DEVICE
+	.fs_flags	= FS_REQUIRES_DEV,
+#endif
+};
+MODULE_ALIAS_FS(SSDFS_VERSION);
+
+static void ssdfs_destroy_caches(void)
+{
+	/*
+	 * Make sure all delayed rcu free inodes are flushed before we
+	 * destroy cache.
+	 */
+	rcu_barrier();
+
+	if (ssdfs_inode_cachep)
+		kmem_cache_destroy(ssdfs_inode_cachep);
+
+	ssdfs_destroy_seg_req_obj_cache();
+	ssdfs_destroy_btree_search_obj_cache();
+	ssdfs_destroy_free_ino_desc_cache();
+	ssdfs_destroy_btree_node_obj_cache();
+	ssdfs_destroy_seg_obj_cache();
+	ssdfs_destroy_extent_info_cache();
+	ssdfs_destroy_peb_mapping_info_cache();
+	ssdfs_destroy_blk2off_frag_obj_cache();
+	ssdfs_destroy_name_info_cache();
+}
+
+static int ssdfs_init_caches(void)
+{
+	int err;
+
+	ssdfs_zero_seg_obj_cache_ptr();
+	ssdfs_zero_seg_req_obj_cache_ptr();
+	ssdfs_zero_extent_info_cache_ptr();
+	ssdfs_zero_btree_node_obj_cache_ptr();
+	ssdfs_zero_btree_search_obj_cache_ptr();
+	ssdfs_zero_free_ino_desc_cache_ptr();
+	ssdfs_zero_peb_mapping_info_cache_ptr();
+	ssdfs_zero_blk2off_frag_obj_cache_ptr();
+	ssdfs_zero_name_info_cache_ptr();
+
+	ssdfs_inode_cachep = kmem_cache_create("ssdfs_inode_cache",
+					sizeof(struct ssdfs_inode_info), 0,
+					SLAB_RECLAIM_ACCOUNT |
+					SLAB_MEM_SPREAD |
+					SLAB_ACCOUNT,
+					ssdfs_init_inode_once);
+	if (!ssdfs_inode_cachep) {
+		SSDFS_ERR("unable to create inode cache\n");
+		return -ENOMEM;
+	}
+
+	err = ssdfs_init_seg_obj_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create segment object cache: err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_seg_req_obj_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create segment request object cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_extent_info_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create extent info object cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_btree_node_obj_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create btree node object cache: err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_btree_search_obj_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create btree search object cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_free_ino_desc_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create free inode descriptors cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_peb_mapping_info_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create PEB mapping descriptors cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_blk2off_frag_obj_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create blk2off fragments cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_name_info_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create name info cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	return 0;
+
+destroy_caches:
+	ssdfs_destroy_caches();
+	return -ENOMEM;
+}
+
+static inline void ssdfs_print_info(void)
+{
+	SSDFS_INFO("%s loaded\n", SSDFS_VERSION);
+}
+
+static int __init ssdfs_init(void)
+{
+	int err;
+
+	err = ssdfs_init_caches();
+	if (err) {
+		SSDFS_ERR("failed to initialize caches\n");
+		goto failed_init;
+	}
+
+	err = ssdfs_compressors_init();
+	if (err) {
+		SSDFS_ERR("failed to initialize compressors\n");
+		goto free_caches;
+	}
+
+	err = ssdfs_sysfs_init();
+	if (err) {
+		SSDFS_ERR("failed to initialize sysfs subsystem\n");
+		goto stop_compressors;
+	}
+
+	err = register_filesystem(&ssdfs_fs_type);
+	if (err) {
+		SSDFS_ERR("failed to register filesystem\n");
+		goto sysfs_exit;
+	}
+
+	ssdfs_print_info();
+
+	return 0;
+
+sysfs_exit:
+	ssdfs_sysfs_exit();
+
+stop_compressors:
+	ssdfs_compressors_exit();
+
+free_caches:
+	ssdfs_destroy_caches();
+
+failed_init:
+	return err;
+}
+
+static void __exit ssdfs_exit(void)
+{
+	ssdfs_destroy_caches();
+	unregister_filesystem(&ssdfs_fs_type);
+	ssdfs_sysfs_exit();
+	ssdfs_compressors_exit();
+}
+
+module_init(ssdfs_init);
+module_exit(ssdfs_exit);
+
+MODULE_DESCRIPTION("SSDFS -- SSD-oriented File System");
+MODULE_AUTHOR("HGST, San Jose Research Center, Storage Architecture Group");
+MODULE_AUTHOR("Viacheslav Dubeyko <slava@dubeyko.com>");
+MODULE_LICENSE("Dual BSD/GPL");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH 05/76] ssdfs: implement commit superblock operation
  2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
                   ` (3 preceding siblings ...)
  2023-02-25  1:08 ` [RFC PATCH 04/76] ssdfs: implement super operations Viacheslav Dubeyko
@ 2023-02-25  1:08 ` Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 06/76] ssdfs: segment header + log footer operations Viacheslav Dubeyko
                   ` (71 subsequent siblings)
  76 siblings, 0 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

SSDFS has specialized superblock segment (erase block) that
has goal to keep the sequence of committed superblocks.
Superblock instance is stored on successful mount operation
and during unmount operation. At first, logic tries to detect
the state of current superblock segment. If segment (erase block)
is completely full, then a new superblock segment is reserved.
As a result, new superblock instance is stored into the sequence.
Actually, SSDFS has main and backup copy of current superblock
segments. Additionally, SSDFS keeps information about previous,
current, next, and reserved superblock segments. SSDFS can use
two policy of segment superblock allocation: (1) reserve a new
segment for every new allocation, (2) use only set of superblock
segments that have been reserved by mkfs tool.

Every commit operation stores log into superblock segment.
This log contains:
(1) segment header,
(2) payload (mapping table cache, for example),
(3) log footer.

Segment header can be considered like static superblock info.
It contains metadata that not changed at all after volume
creation (logical block size, for example) or changed rarely
(number of segments in the volume, for example). Log footer
can be considered like dynamic part of superblock because
it contains frequently updated metadata (for example, root
node of inodes b-tree).

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
CC: Viacheslav Dubeyko <viacheslav.dubeyko@bytedance.com>
CC: Luka Perkov <luka.perkov@sartura.hr>
CC: Bruno Banelli <bruno.banelli@sartura.hr>
---
 fs/ssdfs/super.c | 2200 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 2200 insertions(+)

diff --git a/fs/ssdfs/super.c b/fs/ssdfs/super.c
index a3b144e6eafb..39df1e4d9152 100644
--- a/fs/ssdfs/super.c
+++ b/fs/ssdfs/super.c
@@ -121,6 +121,27 @@ void ssdfs_super_check_memory_leaks(void)
 #endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
 }
 
+struct ssdfs_payload_content {
+	struct pagevec pvec;
+	u32 bytes_count;
+};
+
+struct ssdfs_sb_log_payload {
+	struct ssdfs_payload_content maptbl_cache;
+};
+
+static struct kmem_cache *ssdfs_inode_cachep;
+
+static int ssdfs_prepare_sb_log(struct super_block *sb,
+				struct ssdfs_peb_extent *last_sb_log);
+static int ssdfs_snapshot_sb_log_payload(struct super_block *sb,
+					 struct ssdfs_sb_log_payload *payload);
+static int ssdfs_commit_super(struct super_block *sb, u16 fs_state,
+				struct ssdfs_peb_extent *last_sb_log,
+				struct ssdfs_sb_log_payload *payload);
+static void ssdfs_put_super(struct super_block *sb);
+static void ssdfs_check_memory_leaks(void);
+
 static void init_once(void *foo)
 {
 	struct ssdfs_inode_info *ii = (struct ssdfs_inode_info *)foo;
@@ -528,6 +549,2185 @@ static const struct super_operations ssdfs_super_operations = {
 	.sync_fs	= ssdfs_sync_fs,
 };
 
+static inline
+u32 ssdfs_sb_payload_size(struct pagevec *pvec)
+{
+	struct ssdfs_maptbl_cache_header *hdr;
+	struct page *page;
+	void *kaddr;
+	u16 fragment_bytes_count;
+	u32 bytes_count = 0;
+	int i;
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		page = pvec->pages[i];
+
+		ssdfs_lock_page(page);
+		kaddr = kmap_local_page(page);
+		hdr = (struct ssdfs_maptbl_cache_header *)kaddr;
+		fragment_bytes_count = le16_to_cpu(hdr->bytes_count);
+		kunmap_local(kaddr);
+		ssdfs_unlock_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		WARN_ON(fragment_bytes_count > PAGE_SIZE);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		bytes_count += fragment_bytes_count;
+	}
+
+	return bytes_count;
+}
+
+static u32 ssdfs_define_sb_log_size(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	u32 inline_capacity;
+	u32 log_size = 0;
+	u32 payload_size;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb);
+
+	SSDFS_DBG("sb %p\n", sb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(sb);
+	payload_size = ssdfs_sb_payload_size(&fsi->maptbl_cache.pvec);
+	inline_capacity = PAGE_SIZE - hdr_size;
+
+	if (payload_size > inline_capacity) {
+		log_size += PAGE_SIZE;
+		log_size += atomic_read(&fsi->maptbl_cache.bytes_count);
+		log_size += PAGE_SIZE;
+	} else {
+		log_size += PAGE_SIZE;
+		log_size += PAGE_SIZE;
+	}
+
+	log_size = (log_size + fsi->pagesize - 1) >> fsi->log_pagesize;
+
+	return log_size;
+}
+
+static int ssdfs_snapshot_sb_log_payload(struct super_block *sb,
+					 struct ssdfs_sb_log_payload *payload)
+{
+	struct ssdfs_fs_info *fsi;
+	unsigned pages_count;
+	unsigned i;
+	struct page *spage, *dpage;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !payload);
+	BUG_ON(pagevec_count(&payload->maptbl_cache.pvec) != 0);
+
+	SSDFS_DBG("sb %p, payload %p\n",
+		  sb, payload);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(sb);
+
+	down_read(&fsi->maptbl_cache.lock);
+
+	pages_count = pagevec_count(&fsi->maptbl_cache.pvec);
+
+	for (i = 0; i < pages_count; i++) {
+		dpage =
+		    ssdfs_super_add_pagevec_page(&payload->maptbl_cache.pvec);
+		if (unlikely(IS_ERR_OR_NULL(dpage))) {
+			err = !dpage ? -ENOMEM : PTR_ERR(dpage);
+			SSDFS_ERR("fail to add pagevec page: "
+				  "index %u, err %d\n",
+				  i, err);
+			goto finish_maptbl_snapshot;
+		}
+
+		spage = fsi->maptbl_cache.pvec.pages[i];
+		if (unlikely(!spage)) {
+			err = -ERANGE;
+			SSDFS_ERR("source page is absent: index %u\n",
+				  i);
+			goto finish_maptbl_snapshot;
+		}
+
+		ssdfs_lock_page(spage);
+		ssdfs_lock_page(dpage);
+		ssdfs_memcpy_page(dpage, 0, PAGE_SIZE,
+				  spage, 0, PAGE_SIZE,
+				  PAGE_SIZE);
+		ssdfs_unlock_page(dpage);
+		ssdfs_unlock_page(spage);
+	}
+
+	payload->maptbl_cache.bytes_count =
+		atomic_read(&fsi->maptbl_cache.bytes_count);
+
+finish_maptbl_snapshot:
+	up_read(&fsi->maptbl_cache.lock);
+
+	if (unlikely(err))
+		ssdfs_super_pagevec_release(&payload->maptbl_cache.pvec);
+
+	return err;
+}
+
+static int ssdfs_define_next_sb_log_place(struct super_block *sb,
+					  struct ssdfs_peb_extent *last_sb_log)
+{
+	struct ssdfs_fs_info *fsi;
+	u32 offset;
+	u32 log_size;
+	u64 cur_peb, prev_peb;
+	u64 cur_leb;
+	int i;
+	int err = 0;
+
+	fsi = SSDFS_FS_I(sb);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !last_sb_log);
+
+	SSDFS_DBG("sb %p, last_sb_log %p\n",
+		  sb, last_sb_log);
+	SSDFS_DBG("fsi->sbi.last_log.leb_id %llu, "
+		  "fsi->sbi.last_log.peb_id %llu, "
+		  "fsi->sbi.last_log.page_offset %u, "
+		  "fsi->sbi.last_log.pages_count %u\n",
+		  fsi->sbi.last_log.leb_id,
+		  fsi->sbi.last_log.peb_id,
+		  fsi->sbi.last_log.page_offset,
+		  fsi->sbi.last_log.pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	offset = fsi->sbi.last_log.page_offset;
+
+	log_size = ssdfs_define_sb_log_size(sb);
+	if (log_size > fsi->pages_per_peb) {
+		SSDFS_ERR("log_size %u > fsi->pages_per_peb %u\n",
+			  log_size, fsi->pages_per_peb);
+		return -ERANGE;
+	}
+
+	log_size = max_t(u32, log_size, fsi->sbi.last_log.pages_count);
+
+	if (offset > fsi->pages_per_peb || offset > (UINT_MAX - log_size)) {
+		SSDFS_ERR("inconsistent metadata state: "
+			  "last_sb_log.page_offset %u, "
+			  "pages_per_peb %u, log_size %u\n",
+			  offset, fsi->pages_per_peb, log_size);
+		return -EINVAL;
+	}
+
+	for (err = -EINVAL, i = 0; i < SSDFS_SB_SEG_COPY_MAX; i++) {
+		cur_peb = fsi->sb_pebs[SSDFS_CUR_SB_SEG][i];
+		prev_peb = fsi->sb_pebs[SSDFS_PREV_SB_SEG][i];
+		cur_leb = fsi->sb_lebs[SSDFS_CUR_SB_SEG][i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("cur_peb %llu, prev_peb %llu, "
+			  "last_sb_log.peb_id %llu, err %d\n",
+			  cur_peb, prev_peb, fsi->sbi.last_log.peb_id, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (fsi->sbi.last_log.peb_id == cur_peb) {
+			if ((offset + (2 * log_size)) > fsi->pages_per_peb) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("sb PEB %llu is full: "
+					  "(offset %u + (2 * log_size %u)) > "
+					  "pages_per_peb %u\n",
+					  cur_peb, offset, log_size,
+					  fsi->pages_per_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+				return -EFBIG;
+			}
+
+			last_sb_log->leb_id = cur_leb;
+			last_sb_log->peb_id = cur_peb;
+			last_sb_log->page_offset = offset + log_size;
+			last_sb_log->pages_count = log_size;
+
+			err = 0;
+			break;
+		} else if (fsi->sbi.last_log.peb_id != cur_peb &&
+			   fsi->sbi.last_log.peb_id == prev_peb) {
+
+			last_sb_log->leb_id = cur_leb;
+			last_sb_log->peb_id = cur_peb;
+			last_sb_log->page_offset = 0;
+			last_sb_log->pages_count = log_size;
+
+			err = 0;
+			break;
+		} else {
+			/* continue to check */
+			err = -ERANGE;
+		}
+	}
+
+	if (err) {
+		SSDFS_ERR("inconsistent metadata state: "
+			  "cur_peb %llu, prev_peb %llu, "
+			  "last_sb_log.peb_id %llu\n",
+			  cur_peb, prev_peb, fsi->sbi.last_log.peb_id);
+		return err;
+	}
+
+	for (i = 0; i < SSDFS_SB_SEG_COPY_MAX; i++) {
+		last_sb_log->leb_id = fsi->sb_lebs[SSDFS_CUR_SB_SEG][i];
+		last_sb_log->peb_id = fsi->sb_pebs[SSDFS_CUR_SB_SEG][i];
+		err = ssdfs_can_write_sb_log(sb, last_sb_log);
+		if (err) {
+			SSDFS_ERR("fail to write sb log into PEB %llu\n",
+				  last_sb_log->peb_id);
+			return err;
+		}
+	}
+
+	last_sb_log->leb_id = cur_leb;
+	last_sb_log->peb_id = cur_peb;
+
+	return 0;
+}
+
+static bool ssdfs_sb_seg_exhausted(struct ssdfs_fs_info *fsi,
+				   u64 cur_leb, u64 next_leb)
+{
+	u64 cur_seg, next_seg;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(cur_leb == U64_MAX || next_leb == U64_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	cur_seg = SSDFS_LEB2SEG(fsi, cur_leb);
+	next_seg = SSDFS_LEB2SEG(fsi, next_leb);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_seg %llu, cur_leb %llu, "
+		  "next_seg %llu, next_leb %llu\n",
+		  cur_seg, cur_leb, next_seg, next_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (cur_seg >= U64_MAX || next_seg >= U64_MAX)
+		return true;
+
+	return cur_seg != next_seg;
+}
+
+#ifndef CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET
+static u64 ssdfs_correct_start_leb_id(struct ssdfs_fs_info *fsi,
+				      int seg_type, u64 leb_id)
+{
+	struct completion *init_end;
+	struct ssdfs_maptbl_peb_relation pebr;
+	struct ssdfs_maptbl_peb_descriptor *ptr;
+	u8 peb_type = SSDFS_MAPTBL_UNKNOWN_PEB_TYPE;
+	u32 pebs_per_seg;
+	u64 seg_id;
+	u64 cur_leb;
+	u64 peb_id1, peb_id2;
+	u64 found_peb_id;
+	u64 peb_id_off;
+	u16 pebs_per_fragment;
+	u16 pebs_per_stripe;
+	u16 stripes_per_fragment;
+	u64 calculated_leb_id = leb_id;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+
+	SSDFS_DBG("fsi %p, seg_type %#x, leb_id %llu\n",
+		  fsi, seg_type, leb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	found_peb_id = leb_id;
+	peb_type = SEG2PEB_TYPE(seg_type);
+	pebs_per_seg = fsi->pebs_per_seg;
+
+	seg_id = ssdfs_get_seg_id_for_leb_id(fsi, leb_id);
+	if (unlikely(seg_id >= U64_MAX)) {
+		SSDFS_ERR("invalid seg_id: "
+			  "leb_id %llu\n", leb_id);
+		return -ERANGE;
+	}
+
+	err = ssdfs_maptbl_define_fragment_info(fsi, leb_id,
+						&pebs_per_fragment,
+						&pebs_per_stripe,
+						&stripes_per_fragment);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to define fragment info: "
+			  "err %d\n", err);
+		return err;
+	}
+
+	for (i = 0; i < pebs_per_seg; i++) {
+		cur_leb = ssdfs_get_leb_id_for_peb_index(fsi, seg_id, i);
+		if (cur_leb >= U64_MAX) {
+			SSDFS_ERR("fail to convert PEB index into LEB ID: "
+				  "seg %llu, peb_index %u\n",
+				  seg_id, i);
+			return -ERANGE;
+		}
+
+		err = ssdfs_maptbl_convert_leb2peb(fsi, cur_leb,
+						   peb_type, &pebr,
+						   &init_end);
+		if (err == -EAGAIN) {
+			err = SSDFS_WAIT_COMPLETION(init_end);
+			if (unlikely(err)) {
+				SSDFS_ERR("maptbl init failed: "
+					  "err %d\n", err);
+				goto finish_leb_id_correction;
+			}
+
+			err = ssdfs_maptbl_convert_leb2peb(fsi, cur_leb,
+							   peb_type, &pebr,
+							   &init_end);
+		}
+
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("LEB is not mapped: leb_id %llu\n",
+				  cur_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto finish_leb_id_correction;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to convert LEB to PEB: "
+				  "leb_id %llu, peb_type %#x, err %d\n",
+				  cur_leb, peb_type, err);
+			goto finish_leb_id_correction;
+		}
+
+		ptr = &pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX];
+		peb_id1 = ptr->peb_id;
+		ptr = &pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX];
+		peb_id2 = ptr->peb_id;
+
+		if (peb_id1 < U64_MAX)
+			found_peb_id = max_t(u64, peb_id1, found_peb_id);
+
+		if (peb_id2 < U64_MAX)
+			found_peb_id = max_t(u64, peb_id2, found_peb_id);
+
+		peb_id_off = found_peb_id % pebs_per_stripe;
+		if (peb_id_off >= (pebs_per_stripe / 2)) {
+			calculated_leb_id = found_peb_id / pebs_per_stripe;
+			calculated_leb_id++;
+			calculated_leb_id *= pebs_per_stripe;
+		} else {
+			calculated_leb_id = found_peb_id;
+			calculated_leb_id++;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("found_peb_id %llu, pebs_per_stripe %u, "
+			  "calculated_leb_id %llu\n",
+			  found_peb_id, pebs_per_stripe,
+			  calculated_leb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+finish_leb_id_correction:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("leb_id %llu, calculated_leb_id %llu\n",
+		  leb_id, calculated_leb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return calculated_leb_id;
+}
+#endif /* CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET */
+
+#ifndef CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET
+static int __ssdfs_reserve_clean_segment(struct ssdfs_fs_info *fsi,
+					 int sb_seg_type,
+					 u64 start_search_id,
+					 u64 *reserved_seg)
+{
+	struct ssdfs_segment_bmap *segbmap = fsi->segbmap;
+	u64 start_seg = start_search_id;
+	u64 end_seg = U64_MAX;
+	struct ssdfs_maptbl_peb_relation pebr;
+	struct completion *end;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!reserved_seg);
+	BUG_ON(sb_seg_type >= SSDFS_SB_SEG_COPY_MAX);
+
+	SSDFS_DBG("fsi %p, sb_seg_type %#x, start_search_id %llu\n",
+		  fsi, sb_seg_type, start_search_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (sb_seg_type) {
+	case SSDFS_MAIN_SB_SEG:
+	case SSDFS_COPY_SB_SEG:
+		err = ssdfs_segment_detect_search_range(fsi,
+							&start_seg,
+							&end_seg);
+		if (err == -ENOENT) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find fragment for search: "
+				  "start_seg %llu, end_seg %llu\n",
+				  start_seg, end_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return err;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to define a search range: "
+				  "start_seg %llu, err %d\n",
+				  start_seg, err);
+			return err;
+		}
+		break;
+
+	default:
+		BUG();
+	};
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("start_seg %llu, end_seg %llu\n",
+		  start_seg, end_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_segbmap_reserve_clean_segment(segbmap,
+						  start_seg, end_seg,
+						  reserved_seg, &end);
+	if (err == -EAGAIN) {
+		err = SSDFS_WAIT_COMPLETION(end);
+		if (unlikely(err)) {
+			SSDFS_ERR("segbmap init failed: "
+				  "err %d\n", err);
+			goto finish_search;
+		}
+
+		err = ssdfs_segbmap_reserve_clean_segment(segbmap,
+							  start_seg, end_seg,
+							  reserved_seg,
+							  &end);
+	}
+
+	if (err == -ENODATA) {
+		err = -ENOENT;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to reserve segment: "
+			  "type %#x, start_seg %llu, end_seg %llu\n",
+			  sb_seg_type, start_seg, end_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto finish_search;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to reserve segment: "
+			  "type %#x, start_seg %llu, "
+			   "end_seg %llu, err %d\n",
+			  sb_seg_type, start_seg, end_seg, err);
+		goto finish_search;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("reserved_seg %llu\n", *reserved_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < fsi->pebs_per_seg; i++) {
+		u8 peb_type = SSDFS_MAPTBL_SBSEG_PEB_TYPE;
+		u64 leb_id;
+
+		leb_id = ssdfs_get_leb_id_for_peb_index(fsi, *reserved_seg, i);
+		if (leb_id >= U64_MAX) {
+			err = -ERANGE;
+			SSDFS_ERR("fail to convert PEB index into LEB ID: "
+				  "seg %llu, peb_index %u\n",
+				  *reserved_seg, i);
+			goto finish_search;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("leb_id %llu\n", leb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_maptbl_map_leb2peb(fsi, leb_id, peb_type,
+						&pebr, &end);
+		if (err == -EAGAIN) {
+			err = SSDFS_WAIT_COMPLETION(end);
+			if (unlikely(err)) {
+				SSDFS_ERR("maptbl init failed: "
+					  "err %d\n", err);
+				goto finish_search;
+			}
+
+			err = ssdfs_maptbl_map_leb2peb(fsi, leb_id,
+							peb_type,
+							&pebr, &end);
+		}
+
+		if (err == -EACCES || err == -ENOENT) {
+			if (i == 0) {
+				SSDFS_ERR("fail to map LEB to PEB: "
+					  "reserved_seg %llu, leb_id %llu, "
+					  "err %d\n",
+					  *reserved_seg, leb_id, err);
+				goto finish_search;
+			} else
+				goto finish_search;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to map LEB to PEB: "
+				  "reserved_seg %llu, leb_id %llu, "
+				  "err %d\n",
+				  *reserved_seg, leb_id, err);
+			goto finish_search;
+		}
+	}
+
+finish_search:
+	if (err == -ENOENT)
+		*reserved_seg = end_seg;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("reserved_seg %llu\n", *reserved_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+#endif /* CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET */
+
+#ifndef CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET
+static int ssdfs_reserve_clean_segment(struct super_block *sb,
+					int sb_seg_type, u64 start_leb,
+					u64 *reserved_seg)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	u64 start_search_id;
+	u64 cur_id;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!reserved_seg);
+	BUG_ON(sb_seg_type >= SSDFS_SB_SEG_COPY_MAX);
+
+	SSDFS_DBG("sb %p, sb_seg_type %#x, start_leb %llu\n",
+		  sb, sb_seg_type, start_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*reserved_seg = U64_MAX;
+
+	start_leb = ssdfs_correct_start_leb_id(fsi,
+						SSDFS_SB_SEG_TYPE,
+						start_leb);
+
+	start_search_id = SSDFS_LEB2SEG(fsi, start_leb);
+	if (start_search_id >= fsi->nsegs)
+		start_search_id = 0;
+
+	cur_id = start_search_id;
+
+	while (cur_id < fsi->nsegs) {
+		err = __ssdfs_reserve_clean_segment(fsi, sb_seg_type,
+						    cur_id, reserved_seg);
+		if (err == -ENOENT) {
+			err = 0;
+			cur_id = *reserved_seg;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("cur_id %llu\n", cur_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			continue;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find a new segment: "
+				  "cur_id %llu, err %d\n",
+				  cur_id, err);
+			return err;
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("found seg_id %llu\n", *reserved_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return 0;
+		}
+	}
+
+	cur_id = 0;
+
+	while (cur_id < start_search_id) {
+		err = __ssdfs_reserve_clean_segment(fsi, sb_seg_type,
+						    cur_id, reserved_seg);
+		if (err == -ENOENT) {
+			err = 0;
+			cur_id = *reserved_seg;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("cur_id %llu\n", cur_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			continue;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find a new segment: "
+				  "cur_id %llu, err %d\n",
+				  cur_id, err);
+			return err;
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("found seg_id %llu\n", *reserved_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return 0;
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("no free space for a new segment\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return -ENOSPC;
+}
+#endif /* CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET */
+
+typedef u64 sb_pebs_array[SSDFS_SB_CHAIN_MAX][SSDFS_SB_SEG_COPY_MAX];
+
+static int ssdfs_erase_dirty_prev_sb_segs(struct ssdfs_fs_info *fsi,
+					  u64 prev_leb)
+{
+	struct completion *init_end;
+	u8 peb_type = SSDFS_MAPTBL_UNKNOWN_PEB_TYPE;
+	u32 pebs_per_seg;
+	u64 seg_id;
+	u64 cur_leb;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+
+	SSDFS_DBG("fsi %p, prev_leb %llu\n",
+		  fsi, prev_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	peb_type = SEG2PEB_TYPE(SSDFS_SB_SEG_TYPE);
+	pebs_per_seg = fsi->pebs_per_seg;
+
+	seg_id = SSDFS_LEB2SEG(fsi, prev_leb);
+	if (seg_id >= U64_MAX) {
+		SSDFS_ERR("invalid seg_id for leb_id %llu\n",
+			  prev_leb);
+		return -ERANGE;
+	}
+
+	for (i = 0; i < pebs_per_seg; i++) {
+		cur_leb = ssdfs_get_leb_id_for_peb_index(fsi, seg_id, i);
+		if (cur_leb >= U64_MAX) {
+			SSDFS_ERR("invalid leb_id for seg_id %llu\n",
+				  seg_id);
+			return -ERANGE;
+		}
+
+		err = ssdfs_maptbl_erase_reserved_peb_now(fsi,
+							  cur_leb,
+							  peb_type,
+							  &init_end);
+		if (err == -EAGAIN) {
+			err = SSDFS_WAIT_COMPLETION(init_end);
+			if (unlikely(err)) {
+				SSDFS_ERR("maptbl init failed: "
+					  "err %d\n", err);
+				return err;
+			}
+
+			err = ssdfs_maptbl_erase_reserved_peb_now(fsi,
+								  cur_leb,
+								  peb_type,
+								  &init_end);
+		}
+
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to erase reserved dirty PEB: "
+				  "leb_id %llu, err %d\n",
+				  cur_leb, err);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+static int ssdfs_move_on_next_peb_in_sb_seg(struct super_block *sb,
+					    int sb_seg_type,
+					    sb_pebs_array *sb_lebs,
+					    sb_pebs_array *sb_pebs)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	u64 prev_leb, cur_leb, next_leb, reserved_leb;
+	u64 prev_peb, cur_peb, next_peb, reserved_peb;
+#ifdef CONFIG_SSDFS_DEBUG
+	u64 new_leb = U64_MAX, new_peb = U64_MAX;
+#endif /* CONFIG_SSDFS_DEBUG */
+	struct ssdfs_maptbl_peb_relation pebr;
+	u8 peb_type = SSDFS_MAPTBL_SBSEG_PEB_TYPE;
+	struct completion *end = NULL;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !sb_lebs || !sb_pebs);
+
+	if (sb_seg_type >= SSDFS_SB_SEG_COPY_MAX) {
+		SSDFS_ERR("invalid sb_seg_type %#x\n",
+			  sb_seg_type);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("sb %p, sb_seg_type %#x\n", sb, sb_seg_type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	prev_leb = (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type];
+	cur_leb = (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+	next_leb = cur_leb + 1;
+	reserved_leb = (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type];
+
+	prev_peb = (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type];
+	cur_peb = (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+	next_peb = U64_MAX;
+	reserved_peb = (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type];
+
+	err = ssdfs_maptbl_convert_leb2peb(fsi, next_leb,
+					   peb_type,
+					   &pebr, &end);
+	if (err == -EAGAIN) {
+		err = SSDFS_WAIT_COMPLETION(end);
+		if (unlikely(err)) {
+			SSDFS_ERR("maptbl init failed: "
+				  "err %d\n", err);
+			goto finish_move_sb_seg;
+		}
+
+		err = ssdfs_maptbl_convert_leb2peb(fsi, next_leb,
+						   peb_type,
+						   &pebr, &end);
+	}
+
+	if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("LEB %llu doesn't mapped\n", next_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto finish_move_sb_seg;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to convert LEB %llu to PEB: err %d\n",
+			  next_leb, err);
+		goto finish_move_sb_seg;
+	}
+
+	next_peb = pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(next_peb == U64_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	(*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_leb;
+	(*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_peb;
+
+	(*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_leb;
+	(*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_peb;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_leb %llu, cur_peb %llu, "
+		  "next_leb %llu, next_peb %llu, "
+		  "prev_leb %llu, prev_peb %llu, "
+		  "reserved_leb %llu, reserved_peb %llu, "
+		  "new_leb %llu, new_peb %llu\n",
+		  cur_leb, cur_peb,
+		  next_leb, next_peb,
+		  prev_leb, prev_peb,
+		  reserved_leb, reserved_peb,
+		  new_leb, new_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (prev_leb == U64_MAX)
+		goto finish_move_sb_seg;
+	else {
+		err = ssdfs_erase_dirty_prev_sb_segs(fsi, prev_leb);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail erase dirty PEBs: "
+				  "prev_leb %llu, err %d\n",
+				  prev_leb, err);
+			goto finish_move_sb_seg;
+		}
+	}
+
+finish_move_sb_seg:
+	return err;
+}
+
+#ifdef CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET
+static int ssdfs_move_on_first_peb_next_sb_seg(struct super_block *sb,
+						int sb_seg_type,
+						sb_pebs_array *sb_lebs,
+						sb_pebs_array *sb_pebs)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	u64 prev_leb, cur_leb, next_leb, reserved_leb;
+	u64 prev_peb, cur_peb, next_peb, reserved_peb;
+	u64 seg_id;
+	struct ssdfs_maptbl_peb_relation pebr;
+	u8 peb_type = SSDFS_MAPTBL_SBSEG_PEB_TYPE;
+	struct completion *end = NULL;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !sb_lebs || !sb_pebs);
+
+	if (sb_seg_type >= SSDFS_SB_SEG_COPY_MAX) {
+		SSDFS_ERR("invalid sb_seg_type %#x\n",
+			  sb_seg_type);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("sb %p, sb_seg_type %#x\n", sb, sb_seg_type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	prev_leb = (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type];
+	cur_leb = (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+	next_leb = (*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type];
+	reserved_leb = (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type];
+
+	prev_peb = (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type];
+	cur_peb = (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+	next_peb = (*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type];
+	reserved_peb = (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_peb %llu, next_peb %llu, "
+		  "cur_leb %llu, next_leb %llu\n",
+		  cur_peb, next_peb, cur_leb, next_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	(*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_leb;
+	(*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_peb;
+
+	if (prev_leb >= U64_MAX) {
+		(*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_leb;
+		(*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_peb;
+
+		if (fsi->pebs_per_seg == 1) {
+			(*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] =
+								reserved_leb;
+			(*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] =
+								reserved_peb;
+
+			(*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] =
+									U64_MAX;
+			(*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] =
+									U64_MAX;
+		} else {
+			/*
+			 * do nothing
+			 */
+		}
+	} else {
+		err = ssdfs_erase_dirty_prev_sb_segs(fsi, prev_leb);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail erase dirty PEBs: "
+				  "prev_leb %llu, err %d\n",
+				  prev_leb, err);
+			goto finish_move_sb_seg;
+		}
+
+		if (fsi->pebs_per_seg == 1) {
+			(*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] =
+								prev_leb;
+			(*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] =
+								prev_peb;
+
+			(*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] =
+									U64_MAX;
+			(*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] =
+									U64_MAX;
+
+			(*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_leb;
+			(*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_peb;
+		} else {
+			(*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] =
+								reserved_leb;
+			(*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] =
+								reserved_peb;
+
+			seg_id = SSDFS_LEB2SEG(fsi, prev_leb);
+			if (seg_id >= U64_MAX) {
+				err = -ERANGE;
+				SSDFS_ERR("invalid seg_id for leb_id %llu\n",
+					  prev_leb);
+				goto finish_move_sb_seg;
+			}
+
+			prev_leb = ssdfs_get_leb_id_for_peb_index(fsi, seg_id, 0);
+			if (prev_leb >= U64_MAX) {
+				err = -ERANGE;
+				SSDFS_ERR("invalid leb_id for seg_id %llu\n",
+					  seg_id);
+				goto finish_move_sb_seg;
+			}
+
+			err = ssdfs_maptbl_convert_leb2peb(fsi, prev_leb,
+							   peb_type,
+							   &pebr, &end);
+			if (err == -EAGAIN) {
+				err = SSDFS_WAIT_COMPLETION(end);
+				if (unlikely(err)) {
+					SSDFS_ERR("maptbl init failed: "
+						  "err %d\n", err);
+					goto finish_move_sb_seg;
+				}
+
+				err = ssdfs_maptbl_convert_leb2peb(fsi,
+								   prev_leb,
+								   peb_type,
+								   &pebr, &end);
+			}
+
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to convert LEB %llu to PEB: "
+					  "err %d\n", prev_leb, err);
+				goto finish_move_sb_seg;
+			}
+
+			prev_peb = pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			BUG_ON(prev_peb == U64_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			(*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] =
+									prev_leb;
+			(*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] =
+									prev_peb;
+
+			(*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_leb;
+			(*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_peb;
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_leb %llu, cur_peb %llu, "
+		  "next_leb %llu, next_peb %llu, "
+		  "reserved_leb %llu, reserved_peb %llu, "
+		  "prev_leb %llu, prev_peb %llu\n",
+		  (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type],
+		  (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type],
+		  (*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type],
+		  (*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type],
+		  (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type],
+		  (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type],
+		  (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type],
+		  (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type]);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+finish_move_sb_seg:
+	return err;
+}
+#else
+static int ssdfs_move_on_first_peb_next_sb_seg(struct super_block *sb,
+						int sb_seg_type,
+						sb_pebs_array *sb_lebs,
+						sb_pebs_array *sb_pebs)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct ssdfs_segment_bmap *segbmap = fsi->segbmap;
+	struct ssdfs_maptbl_cache *maptbl_cache = &fsi->maptbl_cache;
+	u64 prev_leb, cur_leb, next_leb, reserved_leb;
+	u64 prev_peb, cur_peb, next_peb, reserved_peb;
+	u64 new_leb = U64_MAX, new_peb = U64_MAX;
+	u64 reserved_seg;
+	u64 prev_seg, cur_seg;
+	struct ssdfs_maptbl_peb_relation pebr;
+	u8 peb_type = SSDFS_MAPTBL_SBSEG_PEB_TYPE;
+	struct completion *end = NULL;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !sb_lebs || !sb_pebs);
+
+	if (sb_seg_type >= SSDFS_SB_SEG_COPY_MAX) {
+		SSDFS_ERR("invalid sb_seg_type %#x\n",
+			  sb_seg_type);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("sb %p, sb_seg_type %#x\n", sb, sb_seg_type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	prev_leb = (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type];
+	cur_leb = (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+	next_leb = (*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type];
+	reserved_leb = (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type];
+
+	prev_peb = (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type];
+	cur_peb = (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+	next_peb = (*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type];
+	reserved_peb = (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_peb %llu, next_peb %llu, "
+		  "cur_leb %llu, next_leb %llu\n",
+		  cur_peb, next_peb, cur_leb, next_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_reserve_clean_segment(sb, sb_seg_type, cur_leb,
+					  &reserved_seg);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to reserve clean segment: err %d\n", err);
+		goto finish_move_sb_seg;
+	}
+
+	new_leb = ssdfs_get_leb_id_for_peb_index(fsi, reserved_seg, 0);
+	if (new_leb >= U64_MAX) {
+		err = -ERANGE;
+		SSDFS_ERR("fail to convert PEB index into LEB ID: "
+			  "seg %llu\n", reserved_seg);
+		goto finish_move_sb_seg;
+	}
+
+	err = ssdfs_maptbl_convert_leb2peb(fsi, new_leb,
+					   peb_type,
+					   &pebr, &end);
+	if (err == -EAGAIN) {
+		err = SSDFS_WAIT_COMPLETION(end);
+		if (unlikely(err)) {
+			SSDFS_ERR("maptbl init failed: "
+				  "err %d\n", err);
+			goto finish_move_sb_seg;
+		}
+
+		err = ssdfs_maptbl_convert_leb2peb(fsi, new_leb,
+						   peb_type,
+						   &pebr, &end);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to convert LEB %llu to PEB: err %d\n",
+			  new_leb, err);
+		goto finish_move_sb_seg;
+	}
+
+	new_peb = pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(new_peb == U64_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	(*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_leb;
+	(*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_peb;
+
+	(*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_leb;
+	(*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_peb;
+
+	(*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = reserved_leb;
+	(*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = reserved_peb;
+
+	(*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = new_leb;
+	(*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = new_peb;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_leb %llu, cur_peb %llu, "
+		  "next_leb %llu, next_peb %llu, "
+		  "reserved_leb %llu, reserved_peb %llu, "
+		  "new_leb %llu, new_peb %llu\n",
+		  cur_leb, cur_peb,
+		  next_leb, next_peb,
+		  reserved_leb, reserved_peb,
+		  new_leb, new_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (prev_leb == U64_MAX)
+		goto finish_move_sb_seg;
+
+	prev_seg = SSDFS_LEB2SEG(fsi, prev_leb);
+	cur_seg = SSDFS_LEB2SEG(fsi, cur_leb);
+
+	if (prev_seg != cur_seg) {
+		err = ssdfs_segbmap_change_state(segbmap, prev_seg,
+						 SSDFS_SEG_DIRTY, &end);
+		if (err == -EAGAIN) {
+			err = SSDFS_WAIT_COMPLETION(end);
+			if (unlikely(err)) {
+				SSDFS_ERR("segbmap init failed: "
+					  "err %d\n", err);
+				goto finish_move_sb_seg;
+			}
+
+			err = ssdfs_segbmap_change_state(segbmap, prev_seg,
+							 SSDFS_SEG_DIRTY, &end);
+		}
+
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to change segment state: "
+				  "seg %llu, state %#x, err %d\n",
+				  prev_seg, SSDFS_SEG_DIRTY, err);
+			goto finish_move_sb_seg;
+		}
+	}
+
+	err = ssdfs_maptbl_change_peb_state(fsi, prev_leb, peb_type,
+					    SSDFS_MAPTBL_DIRTY_PEB_STATE,
+					    &end);
+	if (err == -EAGAIN) {
+		err = SSDFS_WAIT_COMPLETION(end);
+		if (unlikely(err)) {
+			SSDFS_ERR("maptbl init failed: "
+				  "err %d\n", err);
+			goto finish_move_sb_seg;
+		}
+
+		err = ssdfs_maptbl_change_peb_state(fsi,
+						prev_leb, peb_type,
+						SSDFS_MAPTBL_DIRTY_PEB_STATE,
+						&end);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to change the PEB state: "
+			  "leb_id %llu, new_state %#x, err %d\n",
+			  prev_leb, SSDFS_MAPTBL_DIRTY_PEB_STATE, err);
+		goto finish_move_sb_seg;
+	}
+
+	err = ssdfs_maptbl_cache_forget_leb2peb(maptbl_cache, prev_leb,
+						SSDFS_PEB_STATE_CONSISTENT);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to forget prev_leb %llu, err %d\n",
+			  prev_leb, err);
+		goto finish_move_sb_seg;
+	}
+
+finish_move_sb_seg:
+	return err;
+}
+#endif /* CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET */
+
+static int ssdfs_move_on_next_sb_seg(struct super_block *sb,
+				     int sb_seg_type,
+				     sb_pebs_array *sb_lebs,
+				     sb_pebs_array *sb_pebs)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	u64 cur_leb, next_leb;
+	u64 cur_peb;
+	u8 peb_type = SSDFS_MAPTBL_SBSEG_PEB_TYPE;
+	struct completion *end = NULL;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !sb_lebs || !sb_pebs);
+
+	if (sb_seg_type >= SSDFS_SB_SEG_COPY_MAX) {
+		SSDFS_ERR("invalid sb_seg_type %#x\n",
+			  sb_seg_type);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("sb %p, sb_seg_type %#x\n", sb, sb_seg_type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	cur_leb = (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+	cur_peb = (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+
+	next_leb = cur_leb + 1;
+
+	err = ssdfs_maptbl_change_peb_state(fsi, cur_leb, peb_type,
+					    SSDFS_MAPTBL_USED_PEB_STATE,
+					    &end);
+	if (err == -EAGAIN) {
+		err = SSDFS_WAIT_COMPLETION(end);
+		if (unlikely(err)) {
+			SSDFS_ERR("maptbl init failed: "
+				  "err %d\n", err);
+			return err;
+		}
+
+		err = ssdfs_maptbl_change_peb_state(fsi,
+					cur_leb, peb_type,
+					SSDFS_MAPTBL_USED_PEB_STATE,
+					&end);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to change the PEB state: "
+			  "leb_id %llu, new_state %#x, err %d\n",
+			  cur_leb, SSDFS_MAPTBL_USED_PEB_STATE, err);
+		return err;
+	}
+
+	if (!ssdfs_sb_seg_exhausted(fsi, cur_leb, next_leb)) {
+		err = ssdfs_move_on_next_peb_in_sb_seg(sb, sb_seg_type,
+							sb_lebs, sb_pebs);
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to move on next PEB of segment: "
+				  "cur_leb %llu, next_leb %llu\n",
+				  cur_leb, next_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto try_move_on_first_peb_next_sb_seg;
+		}
+	} else {
+try_move_on_first_peb_next_sb_seg:
+		err = ssdfs_move_on_first_peb_next_sb_seg(sb, sb_seg_type,
+							sb_lebs, sb_pebs);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to move on next sb segment: "
+			  "sb_seg_type %#x, cur_leb %llu, "
+			  "cur_peb %llu, err %d\n",
+			  sb_seg_type, cur_leb,
+			  cur_peb, err);
+		return err;
+	}
+
+	return 0;
+}
+
+static int ssdfs_move_on_next_sb_segs_pair(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	sb_pebs_array sb_lebs;
+	sb_pebs_array sb_pebs;
+	size_t array_size;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p", sb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!(fsi->fs_feature_compat & SSDFS_HAS_SEGBMAP_COMPAT_FLAG) ||
+	    !(fsi->fs_feature_compat & SSDFS_HAS_MAPTBL_COMPAT_FLAG)) {
+		SSDFS_ERR("volume hasn't segbmap or maptbl\n");
+		return -EIO;
+	}
+
+	array_size = sizeof(u64);
+	array_size *= SSDFS_SB_CHAIN_MAX;
+	array_size *= SSDFS_SB_SEG_COPY_MAX;
+
+	down_read(&fsi->sb_segs_sem);
+	ssdfs_memcpy(sb_lebs, 0, array_size,
+		     fsi->sb_lebs, 0, array_size,
+		     array_size);
+	ssdfs_memcpy(sb_pebs, 0, array_size,
+		     fsi->sb_pebs, 0, array_size,
+		     array_size);
+	up_read(&fsi->sb_segs_sem);
+
+	for (i = 0; i < SSDFS_SB_SEG_COPY_MAX; i++) {
+		err = ssdfs_move_on_next_sb_seg(sb, i, &sb_lebs, &sb_pebs);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to move on next sb PEB: err %d\n",
+				  err);
+			return err;
+		}
+	}
+
+	down_write(&fsi->sb_segs_sem);
+	ssdfs_memcpy(fsi->sb_lebs, 0, array_size,
+		     sb_lebs, 0, array_size,
+		     array_size);
+	ssdfs_memcpy(fsi->sb_pebs, 0, array_size,
+		     sb_pebs, 0, array_size,
+		     array_size);
+	up_write(&fsi->sb_segs_sem);
+
+	return 0;
+}
+
+static
+int ssdfs_prepare_sb_log(struct super_block *sb,
+			 struct ssdfs_peb_extent *last_sb_log)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !last_sb_log);
+
+	SSDFS_DBG("sb %p, last_sb_log %p\n",
+		  sb, last_sb_log);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_define_next_sb_log_place(sb, last_sb_log);
+	switch (err) {
+	case -EFBIG: /* current sb segment is exhausted */
+	case -EIO: /* current sb segment is corrupted */
+		err = ssdfs_move_on_next_sb_segs_pair(sb);
+		if (err) {
+			SSDFS_ERR("fail to move on next sb segs pair: err %d\n",
+				  err);
+			return err;
+		}
+		err = ssdfs_define_next_sb_log_place(sb, last_sb_log);
+		if (unlikely(err)) {
+			SSDFS_ERR("unable to define next sb log place: err %d\n",
+				  err);
+			return err;
+		}
+		break;
+
+	default:
+		if (err) {
+			SSDFS_ERR("unable to define next sb log place: err %d\n",
+				  err);
+			return err;
+		}
+		break;
+	}
+
+	return 0;
+}
+
+static void
+ssdfs_prepare_maptbl_cache_descriptor(struct ssdfs_metadata_descriptor *desc,
+				      u32 offset,
+				      struct ssdfs_payload_content *payload,
+				      u32 payload_size)
+{
+	unsigned i;
+	u32 csum = ~0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!desc || !payload);
+
+	SSDFS_DBG("desc %p, offset %u, payload %p\n",
+		  desc, offset, payload);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	desc->offset = cpu_to_le32(offset);
+	desc->size = cpu_to_le32(payload_size);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(payload_size >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	desc->check.bytes = cpu_to_le16((u16)payload_size);
+	desc->check.flags = cpu_to_le16(SSDFS_CRC32);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(pagevec_count(&payload->pvec) == 0);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < pagevec_count(&payload->pvec); i++) {
+		struct page *page = payload->pvec.pages[i];
+		struct ssdfs_maptbl_cache_header *hdr;
+		u16 bytes_count;
+		void *kaddr;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_lock_page(page);
+		kaddr = kmap_local_page(page);
+
+		hdr = (struct ssdfs_maptbl_cache_header *)kaddr;
+		bytes_count = le16_to_cpu(hdr->bytes_count);
+
+		csum = crc32(csum, kaddr, bytes_count);
+
+		kunmap_local(kaddr);
+		ssdfs_unlock_page(page);
+	}
+
+	desc->check.csum = cpu_to_le32(csum);
+}
+
+static
+int ssdfs_prepare_snapshot_rules_for_commit(struct ssdfs_fs_info *fsi,
+					struct ssdfs_metadata_descriptor *desc,
+					u32 offset)
+{
+	struct ssdfs_snapshot_rules_header *hdr;
+	size_t hdr_size = sizeof(struct ssdfs_snapshot_rules_header);
+	size_t info_size = sizeof(struct ssdfs_snapshot_rule_info);
+	struct ssdfs_snapshot_rule_item *item = NULL;
+	u32 payload_off;
+	u32 item_off;
+	u32 pagesize = fsi->pagesize;
+	u16 items_count = 0;
+	u16 items_capacity = 0;
+	u32 area_size = 0;
+	struct list_head *this, *next;
+	u32 csum = ~0;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !desc);
+
+	SSDFS_DBG("fsi %p, offset %u\n",
+		  fsi, offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (is_ssdfs_snapshot_rules_list_empty(&fsi->snapshots.rules_list)) {
+		SSDFS_DBG("snapshot rules list is empty\n");
+		return -ENODATA;
+	}
+
+	payload_off = offsetof(struct ssdfs_log_footer, payload);
+	hdr = SSDFS_SNRU_HDR((u8 *)fsi->sbi.vs_buf + payload_off);
+	memset(hdr, 0, hdr_size);
+	area_size = pagesize - payload_off;
+	item_off = payload_off + hdr_size;
+
+	items_capacity = (u16)((area_size - hdr_size) / info_size);
+	area_size = min_t(u32, area_size, (u32)items_capacity * info_size);
+
+	spin_lock(&fsi->snapshots.rules_list.lock);
+	list_for_each_safe(this, next, &fsi->snapshots.rules_list.list) {
+		item = list_entry(this, struct ssdfs_snapshot_rule_item, list);
+
+		err = ssdfs_memcpy(fsi->sbi.vs_buf, item_off, pagesize,
+				   &item->rule, 0, info_size,
+				   info_size);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy: err %d\n", err);
+			goto finish_copy_items;
+		}
+
+		item_off += info_size;
+		items_count++;
+	}
+finish_copy_items:
+	spin_unlock(&fsi->snapshots.rules_list.lock);
+
+	if (unlikely(err))
+		return err;
+
+	hdr->magic = cpu_to_le32(SSDFS_SNAPSHOT_RULES_MAGIC);
+	hdr->item_size = cpu_to_le16(info_size);
+	hdr->flags = cpu_to_le16(0);
+
+	if (items_count == 0 || items_count > items_capacity) {
+		SSDFS_ERR("invalid items number: "
+			  "items_count %u, items_capacity %u, "
+			  "area_size %u, item_size %zu\n",
+			  items_count, items_capacity,
+			  area_size, info_size);
+		return -ERANGE;
+	}
+
+	hdr->items_count = cpu_to_le16(items_count);
+	hdr->items_capacity = cpu_to_le16(items_capacity);
+	hdr->area_size = cpu_to_le16(area_size);
+
+	desc->offset = cpu_to_le32(offset);
+	desc->size = cpu_to_le32(area_size);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(area_size >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	desc->check.bytes = cpu_to_le16(area_size);
+	desc->check.flags = cpu_to_le16(SSDFS_CRC32);
+
+	csum = crc32(csum, hdr, area_size);
+	desc->check.csum = cpu_to_le32(csum);
+
+	return 0;
+}
+
+static int __ssdfs_commit_sb_log(struct super_block *sb,
+				 u64 timestamp, u64 cno,
+				 struct ssdfs_peb_extent *last_sb_log,
+				 struct ssdfs_sb_log_payload *payload)
+{
+	struct ssdfs_fs_info *fsi;
+	struct ssdfs_metadata_descriptor hdr_desc[SSDFS_SEG_HDR_DESC_MAX];
+	struct ssdfs_metadata_descriptor footer_desc[SSDFS_LOG_FOOTER_DESC_MAX];
+	size_t desc_size = sizeof(struct ssdfs_metadata_descriptor);
+	size_t hdr_array_bytes = desc_size * SSDFS_SEG_HDR_DESC_MAX;
+	size_t footer_array_bytes = desc_size * SSDFS_LOG_FOOTER_DESC_MAX;
+	struct ssdfs_metadata_descriptor *cur_hdr_desc;
+	struct page *page;
+	struct ssdfs_segment_header *hdr;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	struct ssdfs_log_footer *footer;
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+	void *kaddr = NULL;
+	loff_t peb_offset, offset;
+	u32 flags = 0;
+	u32 written = 0;
+	unsigned i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !last_sb_log);
+	BUG_ON(!SSDFS_FS_I(sb)->devops);
+	BUG_ON(!SSDFS_FS_I(sb)->devops->writepage);
+	BUG_ON((last_sb_log->page_offset + last_sb_log->pages_count) >
+		(ULLONG_MAX >> SSDFS_FS_I(sb)->log_pagesize));
+	BUG_ON((last_sb_log->leb_id * SSDFS_FS_I(sb)->pebs_per_seg) >=
+		SSDFS_FS_I(sb)->nsegs);
+	BUG_ON(last_sb_log->peb_id >
+		div_u64(ULLONG_MAX, SSDFS_FS_I(sb)->pages_per_peb));
+	BUG_ON((last_sb_log->peb_id * SSDFS_FS_I(sb)->pages_per_peb) >
+		(ULLONG_MAX >> SSDFS_FS_I(sb)->log_pagesize));
+
+	SSDFS_DBG("sb %p, last_sb_log->leb_id %llu, last_sb_log->peb_id %llu, "
+		  "last_sb_log->page_offset %u, last_sb_log->pages_count %u\n",
+		  sb, last_sb_log->leb_id, last_sb_log->peb_id,
+		  last_sb_log->page_offset, last_sb_log->pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(sb);
+	hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf);
+	footer = SSDFS_LF(fsi->sbi.vs_buf);
+
+	memset(hdr_desc, 0, hdr_array_bytes);
+	memset(footer_desc, 0, footer_array_bytes);
+
+	offset = (loff_t)last_sb_log->page_offset << fsi->log_pagesize;
+	offset += PAGE_SIZE;
+
+	cur_hdr_desc = &hdr_desc[SSDFS_MAPTBL_CACHE_INDEX];
+	ssdfs_prepare_maptbl_cache_descriptor(cur_hdr_desc, (u32)offset,
+					     &payload->maptbl_cache,
+					     payload->maptbl_cache.bytes_count);
+
+	offset += payload->maptbl_cache.bytes_count;
+
+	cur_hdr_desc = &hdr_desc[SSDFS_LOG_FOOTER_INDEX];
+	cur_hdr_desc->offset = cpu_to_le32(offset);
+	cur_hdr_desc->size = cpu_to_le32(footer_size);
+
+	ssdfs_memcpy(hdr->desc_array, 0, hdr_array_bytes,
+		     hdr_desc, 0, hdr_array_bytes,
+		     hdr_array_bytes);
+
+	hdr->peb_migration_id[SSDFS_PREV_MIGRATING_PEB] =
+					SSDFS_PEB_UNKNOWN_MIGRATION_ID;
+	hdr->peb_migration_id[SSDFS_CUR_MIGRATING_PEB] =
+					SSDFS_PEB_UNKNOWN_MIGRATION_ID;
+
+	err = ssdfs_prepare_segment_header_for_commit(fsi,
+						     last_sb_log->pages_count,
+						     SSDFS_SB_SEG_TYPE,
+						     SSDFS_LOG_HAS_FOOTER |
+						     SSDFS_LOG_HAS_MAPTBL_CACHE,
+						     timestamp, cno,
+						     hdr);
+	if (err) {
+		SSDFS_ERR("fail to prepare segment header: err %d\n", err);
+		return err;
+	}
+
+	offset += offsetof(struct ssdfs_log_footer, payload);
+	cur_hdr_desc = &footer_desc[SSDFS_SNAPSHOT_RULES_AREA_INDEX];
+
+	err = ssdfs_prepare_snapshot_rules_for_commit(fsi, cur_hdr_desc,
+						      (u32)offset);
+	if (err == -ENODATA) {
+		err = 0;
+		SSDFS_DBG("snapshot rules list is empty\n");
+	} else if (err) {
+		SSDFS_ERR("fail to prepare snapshot rules: err %d\n", err);
+		return err;
+	} else
+		flags |= SSDFS_LOG_FOOTER_HAS_SNAPSHOT_RULES;
+
+	ssdfs_memcpy(footer->desc_array, 0, footer_array_bytes,
+		     footer_desc, 0, footer_array_bytes,
+		     footer_array_bytes);
+
+	err = ssdfs_prepare_log_footer_for_commit(fsi, last_sb_log->pages_count,
+						  flags, timestamp,
+						  cno, footer);
+	if (err) {
+		SSDFS_ERR("fail to prepare log footer: err %d\n", err);
+		return err;
+	}
+
+	page = ssdfs_super_alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (IS_ERR_OR_NULL(page)) {
+		err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+		SSDFS_ERR("unable to allocate memory page\n");
+		return err;
+	}
+
+	/* ->writepage() calls put_page() */
+	ssdfs_get_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	/* write segment header */
+	ssdfs_lock_page(page);
+	ssdfs_memcpy_to_page(page, 0, PAGE_SIZE,
+			     fsi->sbi.vh_buf, 0, hdr_size,
+			     hdr_size);
+	ssdfs_set_page_private(page, 0);
+	SetPageUptodate(page);
+	SetPageDirty(page);
+	ssdfs_unlock_page(page);
+
+	peb_offset = last_sb_log->peb_id * fsi->pages_per_peb;
+	peb_offset <<= fsi->log_pagesize;
+	offset = (loff_t)last_sb_log->page_offset << fsi->log_pagesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(peb_offset > (ULLONG_MAX - (offset + fsi->pagesize)));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	offset += peb_offset;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("offset %llu\n", offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = fsi->devops->writepage(sb, offset, page, 0, hdr_size);
+	if (err) {
+		SSDFS_ERR("fail to write segment header: "
+			  "offset %llu, size %zu\n",
+			  (u64)offset, hdr_size);
+		goto cleanup_after_failure;
+	}
+
+	ssdfs_lock_page(page);
+	ClearPageUptodate(page);
+	ssdfs_clear_page_private(page, 0);
+	ssdfs_unlock_page(page);
+
+	offset += fsi->pagesize;
+
+	for (i = 0; i < pagevec_count(&payload->maptbl_cache.pvec); i++) {
+		struct page *payload_page = payload->maptbl_cache.pvec.pages[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!payload_page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		/* ->writepage() calls put_page() */
+		ssdfs_get_page(payload_page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page %p, count %d\n",
+			  payload_page,
+			  page_ref_count(payload_page));
+
+		kaddr = kmap_local_page(payload_page);
+		SSDFS_DBG("PAYLOAD PAGE %d\n", i);
+		print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+				     kaddr, PAGE_SIZE);
+		kunmap_local(kaddr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_lock_page(payload_page);
+		ssdfs_set_page_private(payload_page, 0);
+		SetPageUptodate(payload_page);
+		SetPageDirty(payload_page);
+		ssdfs_unlock_page(payload_page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("offset %llu\n", offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = fsi->devops->writepage(sb, offset, payload_page,
+					     0, PAGE_SIZE);
+		if (err) {
+			SSDFS_ERR("fail to write maptbl cache page: "
+				  "offset %llu, page_index %u, size %zu\n",
+				  (u64)offset, i, PAGE_SIZE);
+			goto cleanup_after_failure;
+		}
+
+		ssdfs_lock_page(payload_page);
+		ClearPageUptodate(payload_page);
+		ssdfs_clear_page_private(page, 0);
+		ssdfs_unlock_page(payload_page);
+
+		offset += PAGE_SIZE;
+	}
+
+	/* TODO: write metadata payload */
+
+	/* ->writepage() calls put_page() */
+	ssdfs_get_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	/* write log footer */
+	written = 0;
+
+	while (written < fsi->sbi.vs_buf_size) {
+		ssdfs_lock_page(page);
+		kaddr = kmap_local_page(page);
+		memset(kaddr, 0, PAGE_SIZE);
+		ssdfs_memcpy(kaddr, 0, PAGE_SIZE,
+			     fsi->sbi.vs_buf, written, fsi->sbi.vs_buf_size,
+			     PAGE_SIZE);
+		flush_dcache_page(page);
+		kunmap_local(kaddr);
+		ssdfs_set_page_private(page, 0);
+		SetPageUptodate(page);
+		SetPageDirty(page);
+		ssdfs_unlock_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("offset %llu\n", offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = fsi->devops->writepage(sb, offset, page, 0, PAGE_SIZE);
+		if (err) {
+			SSDFS_ERR("fail to write log footer: "
+				  "offset %llu, size %zu\n",
+				  (u64)offset, PAGE_SIZE);
+			goto cleanup_after_failure;
+		}
+
+		ssdfs_lock_page(page);
+		ClearPageUptodate(page);
+		ssdfs_clear_page_private(page, 0);
+		ssdfs_unlock_page(page);
+
+		written += PAGE_SIZE;
+		offset += PAGE_SIZE;
+	};
+
+	ssdfs_super_free_page(page);
+	return 0;
+
+cleanup_after_failure:
+	ssdfs_put_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_super_free_page(page);
+
+	return err;
+}
+
+static int
+__ssdfs_commit_sb_log_inline(struct super_block *sb,
+			     u64 timestamp, u64 cno,
+			     struct ssdfs_peb_extent *last_sb_log,
+			     struct ssdfs_sb_log_payload *payload,
+			     u32 payload_size)
+{
+	struct ssdfs_fs_info *fsi;
+	struct ssdfs_metadata_descriptor hdr_desc[SSDFS_SEG_HDR_DESC_MAX];
+	struct ssdfs_metadata_descriptor footer_desc[SSDFS_LOG_FOOTER_DESC_MAX];
+	size_t desc_size = sizeof(struct ssdfs_metadata_descriptor);
+	size_t hdr_array_bytes = desc_size * SSDFS_SEG_HDR_DESC_MAX;
+	size_t footer_array_bytes = desc_size * SSDFS_LOG_FOOTER_DESC_MAX;
+	struct ssdfs_metadata_descriptor *cur_hdr_desc;
+	struct page *page;
+	struct page *payload_page;
+	struct ssdfs_segment_header *hdr;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	struct ssdfs_log_footer *footer;
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+	void *kaddr = NULL;
+	loff_t peb_offset, offset;
+	u32 inline_capacity;
+	void *payload_buf;
+	u32 flags = 0;
+	u32 written = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !last_sb_log);
+	BUG_ON(!SSDFS_FS_I(sb)->devops);
+	BUG_ON(!SSDFS_FS_I(sb)->devops->writepage);
+	BUG_ON((last_sb_log->page_offset + last_sb_log->pages_count) >
+		(ULLONG_MAX >> SSDFS_FS_I(sb)->log_pagesize));
+	BUG_ON((last_sb_log->leb_id * SSDFS_FS_I(sb)->pebs_per_seg) >=
+		SSDFS_FS_I(sb)->nsegs);
+	BUG_ON(last_sb_log->peb_id >
+		div_u64(ULLONG_MAX, SSDFS_FS_I(sb)->pages_per_peb));
+	BUG_ON((last_sb_log->peb_id * SSDFS_FS_I(sb)->pages_per_peb) >
+		(ULLONG_MAX >> SSDFS_FS_I(sb)->log_pagesize));
+
+	SSDFS_DBG("sb %p, last_sb_log->leb_id %llu, last_sb_log->peb_id %llu, "
+		  "last_sb_log->page_offset %u, last_sb_log->pages_count %u\n",
+		  sb, last_sb_log->leb_id, last_sb_log->peb_id,
+		  last_sb_log->page_offset, last_sb_log->pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(sb);
+	hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf);
+	footer = SSDFS_LF(fsi->sbi.vs_buf);
+
+	memset(hdr_desc, 0, hdr_array_bytes);
+	memset(footer_desc, 0, footer_array_bytes);
+
+	offset = (loff_t)last_sb_log->page_offset << fsi->log_pagesize;
+	offset += hdr_size;
+
+	cur_hdr_desc = &hdr_desc[SSDFS_MAPTBL_CACHE_INDEX];
+	ssdfs_prepare_maptbl_cache_descriptor(cur_hdr_desc, (u32)offset,
+					      &payload->maptbl_cache,
+					      payload_size);
+
+	offset += payload_size;
+
+	offset += fsi->pagesize - 1;
+	offset = (offset >> fsi->log_pagesize) << fsi->log_pagesize;
+
+	cur_hdr_desc = &hdr_desc[SSDFS_LOG_FOOTER_INDEX];
+	cur_hdr_desc->offset = cpu_to_le32(offset);
+	cur_hdr_desc->size = cpu_to_le32(footer_size);
+
+	ssdfs_memcpy(hdr->desc_array, 0, hdr_array_bytes,
+		     hdr_desc, 0, hdr_array_bytes,
+		     hdr_array_bytes);
+
+	hdr->peb_migration_id[SSDFS_PREV_MIGRATING_PEB] =
+					SSDFS_PEB_UNKNOWN_MIGRATION_ID;
+	hdr->peb_migration_id[SSDFS_CUR_MIGRATING_PEB] =
+					SSDFS_PEB_UNKNOWN_MIGRATION_ID;
+
+	err = ssdfs_prepare_segment_header_for_commit(fsi,
+						     last_sb_log->pages_count,
+						     SSDFS_SB_SEG_TYPE,
+						     SSDFS_LOG_HAS_FOOTER |
+						     SSDFS_LOG_HAS_MAPTBL_CACHE,
+						     timestamp, cno,
+						     hdr);
+	if (err) {
+		SSDFS_ERR("fail to prepare segment header: err %d\n", err);
+		return err;
+	}
+
+	offset += offsetof(struct ssdfs_log_footer, payload);
+	cur_hdr_desc = &footer_desc[SSDFS_SNAPSHOT_RULES_AREA_INDEX];
+
+	err = ssdfs_prepare_snapshot_rules_for_commit(fsi, cur_hdr_desc,
+						      (u32)offset);
+	if (err == -ENODATA) {
+		err = 0;
+		SSDFS_DBG("snapshot rules list is empty\n");
+	} else if (err) {
+		SSDFS_ERR("fail to prepare snapshot rules: err %d\n", err);
+		return err;
+	} else
+		flags |= SSDFS_LOG_FOOTER_HAS_SNAPSHOT_RULES;
+
+	ssdfs_memcpy(footer->desc_array, 0, footer_array_bytes,
+		     footer_desc, 0, footer_array_bytes,
+		     footer_array_bytes);
+
+	err = ssdfs_prepare_log_footer_for_commit(fsi, last_sb_log->pages_count,
+						  flags, timestamp,
+						  cno, footer);
+	if (err) {
+		SSDFS_ERR("fail to prepare log footer: err %d\n", err);
+		return err;
+	}
+
+	if (pagevec_count(&payload->maptbl_cache.pvec) != 1) {
+		SSDFS_WARN("payload contains several memory pages\n");
+		return -ERANGE;
+	}
+
+	inline_capacity = PAGE_SIZE - hdr_size;
+
+	if (payload_size > inline_capacity) {
+		SSDFS_ERR("payload_size %u > inline_capacity %u\n",
+			  payload_size, inline_capacity);
+		return -ERANGE;
+	}
+
+	payload_buf = ssdfs_super_kmalloc(inline_capacity, GFP_KERNEL);
+	if (!payload_buf) {
+		SSDFS_ERR("fail to allocate payload buffer\n");
+		return -ENOMEM;
+	}
+
+	page = ssdfs_super_alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (IS_ERR_OR_NULL(page)) {
+		err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+		SSDFS_ERR("unable to allocate memory page\n");
+		ssdfs_super_kfree(payload_buf);
+		return err;
+	}
+
+	/* ->writepage() calls put_page() */
+	ssdfs_get_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	payload_page = payload->maptbl_cache.pvec.pages[0];
+	if (!payload_page) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid payload page\n");
+		goto free_payload_buffer;
+	}
+
+	ssdfs_lock_page(payload_page);
+	err = ssdfs_memcpy_from_page(payload_buf, 0, inline_capacity,
+				     payload_page, 0, PAGE_SIZE,
+				     payload_size);
+	ssdfs_unlock_page(payload_page);
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to copy: err %d\n", err);
+		goto free_payload_buffer;
+	}
+
+	/* write segment header + payload */
+	ssdfs_lock_page(page);
+	kaddr = kmap_local_page(page);
+	ssdfs_memcpy(kaddr, 0, PAGE_SIZE,
+		     fsi->sbi.vh_buf, 0, hdr_size,
+		     hdr_size);
+	err = ssdfs_memcpy(kaddr, hdr_size, PAGE_SIZE,
+			   payload_buf, 0, inline_capacity,
+			   payload_size);
+	flush_dcache_page(page);
+	kunmap_local(kaddr);
+	if (!err) {
+		ssdfs_set_page_private(page, 0);
+		SetPageUptodate(page);
+		SetPageDirty(page);
+	}
+	ssdfs_unlock_page(page);
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to copy: err %d\n", err);
+		goto free_payload_buffer;
+	}
+
+free_payload_buffer:
+	ssdfs_super_kfree(payload_buf);
+
+	if (unlikely(err))
+		goto cleanup_after_failure;
+
+	peb_offset = last_sb_log->peb_id * fsi->pages_per_peb;
+	peb_offset <<= fsi->log_pagesize;
+	offset = (loff_t)last_sb_log->page_offset << fsi->log_pagesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(peb_offset > (ULLONG_MAX - (offset + fsi->pagesize)));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	offset += peb_offset;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("offset %llu\n", offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = fsi->devops->writepage(sb, offset, page, 0,
+				     hdr_size + payload_size);
+	if (err) {
+		SSDFS_ERR("fail to write segment header: "
+			  "offset %llu, size %zu\n",
+			  (u64)offset, hdr_size + payload_size);
+		goto cleanup_after_failure;
+	}
+
+	ssdfs_lock_page(page);
+	ClearPageUptodate(page);
+	ssdfs_clear_page_private(page, 0);
+	ssdfs_unlock_page(page);
+
+	offset += fsi->pagesize;
+
+	/* ->writepage() calls put_page() */
+	ssdfs_get_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	/* write log footer */
+	written = 0;
+
+	while (written < fsi->sbi.vs_buf_size) {
+		ssdfs_lock_page(page);
+		kaddr = kmap_local_page(page);
+		memset(kaddr, 0, PAGE_SIZE);
+		ssdfs_memcpy(kaddr, 0, PAGE_SIZE,
+			     fsi->sbi.vs_buf, written, fsi->sbi.vs_buf_size,
+			     PAGE_SIZE);
+		flush_dcache_page(page);
+		kunmap_local(kaddr);
+		ssdfs_set_page_private(page, 0);
+		SetPageUptodate(page);
+		SetPageDirty(page);
+		ssdfs_unlock_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("offset %llu\n", offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = fsi->devops->writepage(sb, offset, page, 0, PAGE_SIZE);
+		if (err) {
+			SSDFS_ERR("fail to write log footer: "
+				  "offset %llu, size %zu\n",
+				  (u64)offset, PAGE_SIZE);
+			goto cleanup_after_failure;
+		}
+
+		ssdfs_lock_page(page);
+		ClearPageUptodate(page);
+		ssdfs_clear_page_private(page, 0);
+		ssdfs_unlock_page(page);
+
+		written += PAGE_SIZE;
+		offset += PAGE_SIZE;
+	};
+
+	ssdfs_super_free_page(page);
+	return 0;
+
+cleanup_after_failure:
+	ssdfs_put_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_super_free_page(page);
+
+	return err;
+}
+
+static int ssdfs_commit_sb_log(struct super_block *sb,
+				u64 timestamp, u64 cno,
+				struct ssdfs_peb_extent *last_sb_log,
+				struct ssdfs_sb_log_payload *payload)
+{
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	u32 inline_capacity;
+	u32 payload_size;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !last_sb_log || !payload);
+
+	SSDFS_DBG("sb %p, last_sb_log->leb_id %llu, last_sb_log->peb_id %llu, "
+		  "last_sb_log->page_offset %u, last_sb_log->pages_count %u\n",
+		  sb, last_sb_log->leb_id, last_sb_log->peb_id,
+		  last_sb_log->page_offset, last_sb_log->pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	inline_capacity = PAGE_SIZE - hdr_size;
+	payload_size = ssdfs_sb_payload_size(&payload->maptbl_cache.pvec);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("inline_capacity %u, payload_size %u\n",
+		  inline_capacity, payload_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (payload_size > inline_capacity) {
+		err = __ssdfs_commit_sb_log(sb, timestamp, cno,
+					    last_sb_log, payload);
+	} else {
+		err = __ssdfs_commit_sb_log_inline(sb, timestamp, cno,
+						   last_sb_log,
+						   payload, payload_size);
+	}
+
+	if (unlikely(err))
+		SSDFS_ERR("fail to commit sb log: err %d\n", err);
+
+	return err;
+}
+
+static
+int ssdfs_commit_super(struct super_block *sb, u16 fs_state,
+			struct ssdfs_peb_extent *last_sb_log,
+			struct ssdfs_sb_log_payload *payload)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	__le64 cur_segs[SSDFS_CUR_SEGS_COUNT];
+	size_t size = sizeof(__le64) * SSDFS_CUR_SEGS_COUNT;
+	u64 timestamp = ssdfs_current_timestamp();
+	u64 cno = ssdfs_current_cno(sb);
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !last_sb_log || !payload);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("sb %p, fs_state %u", sb, fs_state);
+#else
+	SSDFS_DBG("sb %p, fs_state %u", sb, fs_state);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	BUG_ON(fs_state > SSDFS_LAST_KNOWN_FS_STATE);
+
+	if (le16_to_cpu(fsi->vs->state) == SSDFS_ERROR_FS &&
+	    !ssdfs_test_opt(fsi->mount_opts, IGNORE_FS_STATE)) {
+		SSDFS_DBG("refuse commit superblock: fs erroneous state\n");
+		return 0;
+	}
+
+	err = ssdfs_prepare_volume_header_for_commit(fsi, fsi->vh);
+	if (unlikely(err)) {
+		SSDFS_CRIT("volume header is inconsistent: err %d\n", err);
+		goto finish_commit_super;
+	}
+
+	err = ssdfs_prepare_current_segment_ids(fsi, cur_segs, size);
+	if (unlikely(err)) {
+		SSDFS_CRIT("fail to prepare current segments IDs: err %d\n",
+			   err);
+		goto finish_commit_super;
+	}
+
+	err = ssdfs_prepare_volume_state_info_for_commit(fsi, fs_state,
+							 cur_segs, size,
+							 timestamp,
+							 cno,
+							 fsi->vs);
+	if (unlikely(err)) {
+		SSDFS_CRIT("volume state info is inconsistent: err %d\n", err);
+		goto finish_commit_super;
+	}
+
+	for (i = 0; i < SSDFS_SB_SEG_COPY_MAX; i++) {
+		last_sb_log->leb_id = fsi->sb_lebs[SSDFS_CUR_SB_SEG][i];
+		last_sb_log->peb_id = fsi->sb_pebs[SSDFS_CUR_SB_SEG][i];
+		err = ssdfs_commit_sb_log(sb, timestamp, cno,
+					  last_sb_log, payload);
+		if (err) {
+			SSDFS_ERR("fail to commit superblock log: "
+				  "leb_id %llu, peb_id %llu, "
+				  "page_offset %u, pages_count %u, "
+				  "err %d\n",
+				  last_sb_log->leb_id,
+				  last_sb_log->peb_id,
+				  last_sb_log->page_offset,
+				  last_sb_log->pages_count,
+				  err);
+			goto finish_commit_super;
+		}
+	}
+
+	last_sb_log->leb_id = fsi->sb_lebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG];
+	last_sb_log->peb_id = fsi->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG];
+
+	ssdfs_memcpy(&fsi->sbi.last_log,
+		     0, sizeof(struct ssdfs_peb_extent),
+		     last_sb_log,
+		     0, sizeof(struct ssdfs_peb_extent),
+		     sizeof(struct ssdfs_peb_extent));
+
+finish_commit_super:
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("finished: err %d\n", err);
+#else
+	SSDFS_DBG("finished: err %d\n", err);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	return err;
+}
+
 static void ssdfs_memory_page_locks_checker_init(void)
 {
 #ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH 06/76] ssdfs: segment header + log footer operations
  2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
                   ` (4 preceding siblings ...)
  2023-02-25  1:08 ` [RFC PATCH 05/76] ssdfs: implement commit superblock operation Viacheslav Dubeyko
@ 2023-02-25  1:08 ` Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 07/76] ssdfs: basic mount logic implementation Viacheslav Dubeyko
                   ` (70 subsequent siblings)
  76 siblings, 0 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

SSDFS is Log-structured File System (LFS). It means that volume
is a sequence of segments that can contain one or several erase
blocks. Any write operation into erase block is a creation of log.
And content of every erase block is a sequence of logs. Log can be
full or partial. Full log starts by header and it is finished by footer.
The size of full log is fixed and it is defined during mkfs phase.
However, tunefs tool can change this value. But if commit operation
has not enough data to prepare the full log, then partial log is created.
Partial log starts with partial log header and it hasn't footer.
The partial log can be imagined like mixture of segment header and
log footer.

Segment header can be considered like static superblock info.
It contains metadata that not changed at all after volume
creation (logical block size, for example) or changed rarely
(number of segments in the volume, for example). Log footer
can be considered like dynamic part of superblock because
it contains frequently updated metadata (for example, root
node of inodes b-tree).

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
CC: Viacheslav Dubeyko <viacheslav.dubeyko@bytedance.com>
CC: Luka Perkov <luka.perkov@sartura.hr>
CC: Bruno Banelli <bruno.banelli@sartura.hr>
---
 fs/ssdfs/log_footer.c    |  901 +++++++++++++++++++++++++++
 fs/ssdfs/volume_header.c | 1256 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 2157 insertions(+)
 create mode 100644 fs/ssdfs/log_footer.c
 create mode 100644 fs/ssdfs/volume_header.c

diff --git a/fs/ssdfs/log_footer.c b/fs/ssdfs/log_footer.c
new file mode 100644
index 000000000000..f56a268f310e
--- /dev/null
+++ b/fs/ssdfs/log_footer.c
@@ -0,0 +1,901 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/log_footer.c - operations with log footer.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+#include <linux/rwsem.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+#include "segment_bitmap.h"
+#include "offset_translation_table.h"
+#include "page_array.h"
+#include "page_vector.h"
+#include "peb_container.h"
+#include "segment.h"
+#include "current_segment.h"
+
+#include <trace/events/ssdfs.h>
+
+/*
+ * __is_ssdfs_log_footer_magic_valid() - check log footer's magic
+ * @magic: pointer on magic value
+ */
+bool __is_ssdfs_log_footer_magic_valid(struct ssdfs_signature *magic)
+{
+	return le16_to_cpu(magic->key) == SSDFS_LOG_FOOTER_MAGIC;
+}
+
+/*
+ * is_ssdfs_log_footer_magic_valid() - check log footer's magic
+ * @footer: log footer
+ */
+bool is_ssdfs_log_footer_magic_valid(struct ssdfs_log_footer *footer)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!footer);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return __is_ssdfs_log_footer_magic_valid(&footer->volume_state.magic);
+}
+
+/*
+ * is_ssdfs_log_footer_csum_valid() - check log footer's checksum
+ * @buf: buffer with log footer
+ * @size: size of buffer in bytes
+ */
+bool is_ssdfs_log_footer_csum_valid(void *buf, size_t buf_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return is_csum_valid(&SSDFS_LF(buf)->volume_state.check, buf, buf_size);
+}
+
+/*
+ * is_ssdfs_volume_state_info_consistent() - check volume state consistency
+ * @fsi: pointer on shared file system object
+ * @buf: log header
+ * @footer: log footer
+ * @dev_size: partition size in bytes
+ *
+ * RETURN:
+ * [true]  - volume state metadata is consistent.
+ * [false] - volume state metadata is corrupted.
+ */
+bool is_ssdfs_volume_state_info_consistent(struct ssdfs_fs_info *fsi,
+					   void *buf,
+					   struct ssdfs_log_footer *footer,
+					   u64 dev_size)
+{
+	struct ssdfs_signature *magic;
+	u64 nsegs;
+	u64 free_pages;
+	u8 log_segsize = U8_MAX;
+	u32 seg_size = U32_MAX;
+	u32 page_size = U32_MAX;
+	u64 cno = U64_MAX;
+	u16 log_pages = U16_MAX;
+	u32 log_bytes = U32_MAX;
+	u64 pages_count;
+	u32 pages_per_seg;
+	u32 remainder;
+	u16 fs_state;
+	u16 fs_errors;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!buf || !footer);
+
+	SSDFS_DBG("buf %p, footer %p, dev_size %llu\n",
+		  buf, footer, dev_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	magic = (struct ssdfs_signature *)buf;
+
+	if (!is_ssdfs_magic_valid(magic)) {
+		SSDFS_DBG("valid magic is not detected\n");
+		return -ERANGE;
+	}
+
+	if (__is_ssdfs_segment_header_magic_valid(magic)) {
+		struct ssdfs_segment_header *hdr;
+		struct ssdfs_volume_header *vh;
+
+		hdr = SSDFS_SEG_HDR(buf);
+		vh = SSDFS_VH(buf);
+
+		log_segsize = vh->log_segsize;
+		seg_size = 1 << vh->log_segsize;
+		page_size = 1 << vh->log_pagesize;
+		cno = le64_to_cpu(hdr->cno);
+		log_pages = le16_to_cpu(hdr->log_pages);
+	} else if (is_ssdfs_partial_log_header_magic_valid(magic)) {
+		struct ssdfs_partial_log_header *pl_hdr;
+
+		pl_hdr = SSDFS_PLH(buf);
+
+		log_segsize = pl_hdr->log_segsize;
+		seg_size = 1 << pl_hdr->log_segsize;
+		page_size = 1 << pl_hdr->log_pagesize;
+		cno = le64_to_cpu(pl_hdr->cno);
+		log_pages = le16_to_cpu(pl_hdr->log_pages);
+	} else {
+		SSDFS_DBG("log header is corrupted\n");
+		return -EIO;
+	}
+
+	nsegs = le64_to_cpu(footer->volume_state.nsegs);
+
+	if (nsegs == 0 || nsegs > (dev_size >> log_segsize)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("invalid nsegs %llu, dev_size %llu, seg_size) %u\n",
+			  nsegs, dev_size, seg_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	free_pages = le64_to_cpu(footer->volume_state.free_pages);
+
+	pages_count = div_u64_rem(dev_size, page_size, &remainder);
+	if (remainder) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("dev_size %llu is unaligned on page_size %u\n",
+			  dev_size, page_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+	if (free_pages > pages_count) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("free_pages %llu is greater than pages_count %llu\n",
+			  free_pages, pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	pages_per_seg = seg_size / page_size;
+	if (nsegs <= div_u64(free_pages, pages_per_seg)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("invalid nsegs %llu, free_pages %llu, "
+			  "pages_per_seg %u\n",
+			  nsegs, free_pages, pages_per_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	if (cno > le64_to_cpu(footer->cno)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("create_cno %llu is greater than write_cno %llu\n",
+			  cno, le64_to_cpu(footer->cno));
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	log_bytes = (u32)log_pages * fsi->pagesize;
+	if (le32_to_cpu(footer->log_bytes) > log_bytes) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("hdr log_bytes %u > footer log_bytes %u\n",
+			  log_bytes,
+			  le32_to_cpu(footer->log_bytes));
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+	}
+
+	fs_state = le16_to_cpu(footer->volume_state.state);
+	if (fs_state > SSDFS_LAST_KNOWN_FS_STATE) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unknown FS state %#x\n",
+			  fs_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	fs_errors = le16_to_cpu(footer->volume_state.errors);
+	if (fs_errors > SSDFS_LAST_KNOWN_FS_ERROR) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unknown FS error %#x\n",
+			  fs_errors);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * ssdfs_check_log_footer() - check log footer consistency
+ * @fsi: pointer on shared file system object
+ * @buf: log header
+ * @footer: log footer
+ * @silent: show error or not?
+ *
+ * This function checks consistency of log footer.
+ *
+ * RETURN:
+ * [success] - log footer is consistent.
+ * [failure] - error code:
+ *
+ * %-ENODATA     - valid magic doesn't detected.
+ * %-EIO         - log footer is corrupted.
+ */
+int ssdfs_check_log_footer(struct ssdfs_fs_info *fsi,
+			   void *buf,
+			   struct ssdfs_log_footer *footer,
+			   bool silent)
+{
+	struct ssdfs_volume_state *vs;
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+	u64 dev_size;
+	bool major_magic_valid, minor_magic_valid;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !buf || !footer);
+
+	SSDFS_DBG("fsi %p, buf %p, footer %p, silent %#x\n",
+		  fsi, buf, footer, silent);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	vs = SSDFS_VS(footer);
+
+	major_magic_valid = is_ssdfs_magic_valid(&vs->magic);
+	minor_magic_valid = is_ssdfs_log_footer_magic_valid(footer);
+
+	if (!major_magic_valid && !minor_magic_valid) {
+		if (!silent)
+			SSDFS_ERR("valid magic doesn't detected\n");
+		else
+			SSDFS_DBG("valid magic doesn't detected\n");
+		return -ENODATA;
+	} else if (!major_magic_valid) {
+		if (!silent)
+			SSDFS_ERR("invalid SSDFS magic signature\n");
+		else
+			SSDFS_DBG("invalid SSDFS magic signature\n");
+		return -EIO;
+	} else if (!minor_magic_valid) {
+		if (!silent)
+			SSDFS_ERR("invalid log footer magic signature\n");
+		else
+			SSDFS_DBG("invalid log footer magic signature\n");
+		return -EIO;
+	}
+
+	if (!is_ssdfs_log_footer_csum_valid(footer, footer_size)) {
+		if (!silent)
+			SSDFS_ERR("invalid checksum of log footer\n");
+		else
+			SSDFS_DBG("invalid checksum of log footer\n");
+		return -EIO;
+	}
+
+	dev_size = fsi->devops->device_size(fsi->sb);
+	if (!is_ssdfs_volume_state_info_consistent(fsi, buf,
+						   footer, dev_size)) {
+		if (!silent)
+			SSDFS_ERR("log footer is corrupted\n");
+		else
+			SSDFS_DBG("log footer is corrupted\n");
+		return -EIO;
+	}
+
+	if (le32_to_cpu(footer->log_flags) & ~SSDFS_LOG_FOOTER_FLAG_MASK) {
+		if (!silent) {
+			SSDFS_ERR("corrupted log_flags %#x\n",
+				  le32_to_cpu(footer->log_flags));
+		} else {
+			SSDFS_DBG("corrupted log_flags %#x\n",
+				  le32_to_cpu(footer->log_flags));
+		}
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_read_unchecked_log_footer() - read log footer without check
+ * @fsi: pointer on shared file system object
+ * @peb_id: PEB identification number
+ * @bytes_off: offset inside PEB in bytes
+ * @buf: buffer for log footer
+ * @silent: show error or not?
+ * @log_pages: number of pages in the log
+ *
+ * This function reads log footer without
+ * the consistency check.
+ *
+ * RETURN:
+ * [success] - log footer is consistent.
+ * [failure] - error code:
+ *
+ * %-ENODATA     - valid magic doesn't detected.
+ * %-EIO         - log footer is corrupted.
+ */
+int ssdfs_read_unchecked_log_footer(struct ssdfs_fs_info *fsi,
+				    u64 peb_id, u32 bytes_off,
+				    void *buf, bool silent,
+				    u32 *log_pages)
+{
+	struct ssdfs_signature *magic;
+	struct ssdfs_log_footer *footer;
+	struct ssdfs_volume_state *vs;
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+	struct ssdfs_partial_log_header *pl_hdr;
+	size_t hdr_size = sizeof(struct ssdfs_partial_log_header);
+	bool major_magic_valid, minor_magic_valid;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !fsi->devops->read);
+	BUG_ON(!buf || !log_pages);
+	BUG_ON(bytes_off >= (fsi->pages_per_peb * fsi->pagesize));
+
+	SSDFS_DBG("peb_id %llu, bytes_off %u, buf %p\n",
+		  peb_id, bytes_off, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*log_pages = U32_MAX;
+
+	err = ssdfs_unaligned_read_buffer(fsi, peb_id, bytes_off,
+					  buf, footer_size);
+	if (unlikely(err)) {
+		if (!silent) {
+			SSDFS_ERR("fail to read log footer: "
+				  "peb_id %llu, bytes_off %u, err %d\n",
+				  peb_id, bytes_off, err);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("fail to read log footer: "
+				  "peb_id %llu, bytes_off %u, err %d\n",
+				  peb_id, bytes_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return err;
+	}
+
+	magic = (struct ssdfs_signature *)buf;
+
+	if (!is_ssdfs_magic_valid(magic)) {
+		if (!silent)
+			SSDFS_ERR("valid magic is not detected\n");
+		else
+			SSDFS_DBG("valid magic is not detected\n");
+
+		return -ENODATA;
+	}
+
+	if (__is_ssdfs_log_footer_magic_valid(magic)) {
+		footer = SSDFS_LF(buf);
+		vs = SSDFS_VS(footer);
+
+		major_magic_valid = is_ssdfs_magic_valid(&vs->magic);
+		minor_magic_valid = is_ssdfs_log_footer_magic_valid(footer);
+
+		if (!major_magic_valid && !minor_magic_valid) {
+			if (!silent)
+				SSDFS_ERR("valid magic doesn't detected\n");
+			else
+				SSDFS_DBG("valid magic doesn't detected\n");
+			return -ENODATA;
+		} else if (!major_magic_valid) {
+			if (!silent)
+				SSDFS_ERR("invalid SSDFS magic signature\n");
+			else
+				SSDFS_DBG("invalid SSDFS magic signature\n");
+			return -EIO;
+		} else if (!minor_magic_valid) {
+			if (!silent)
+				SSDFS_ERR("invalid log footer magic\n");
+			else
+				SSDFS_DBG("invalid log footer magic\n");
+			return -EIO;
+		}
+
+		if (!is_ssdfs_log_footer_csum_valid(footer, footer_size)) {
+			if (!silent)
+				SSDFS_ERR("invalid checksum of log footer\n");
+			else
+				SSDFS_DBG("invalid checksum of log footer\n");
+			return -EIO;
+		}
+
+		*log_pages = le32_to_cpu(footer->log_bytes);
+		*log_pages /= fsi->pagesize;
+
+		if (*log_pages == 0 || *log_pages >= fsi->pages_per_peb) {
+			if (!silent)
+				SSDFS_ERR("invalid log pages %u\n", *log_pages);
+			else
+				SSDFS_DBG("invalid log pages %u\n", *log_pages);
+			return -EIO;
+		}
+	} else if (is_ssdfs_partial_log_header_magic_valid(magic)) {
+		pl_hdr = SSDFS_PLH(buf);
+
+		major_magic_valid = is_ssdfs_magic_valid(&pl_hdr->magic);
+		minor_magic_valid =
+			is_ssdfs_partial_log_header_magic_valid(&pl_hdr->magic);
+
+		if (!major_magic_valid && !minor_magic_valid) {
+			if (!silent)
+				SSDFS_ERR("valid magic doesn't detected\n");
+			else
+				SSDFS_DBG("valid magic doesn't detected\n");
+			return -ENODATA;
+		} else if (!major_magic_valid) {
+			if (!silent)
+				SSDFS_ERR("invalid SSDFS magic signature\n");
+			else
+				SSDFS_DBG("invalid SSDFS magic signature\n");
+			return -EIO;
+		} else if (!minor_magic_valid) {
+			if (!silent)
+				SSDFS_ERR("invalid partial log header magic\n");
+			else
+				SSDFS_DBG("invalid partial log header magic\n");
+			return -EIO;
+		}
+
+		if (!is_ssdfs_partial_log_header_csum_valid(pl_hdr, hdr_size)) {
+			if (!silent)
+				SSDFS_ERR("invalid checksum of footer\n");
+			else
+				SSDFS_DBG("invalid checksum of footer\n");
+			return -EIO;
+		}
+
+		*log_pages = le32_to_cpu(pl_hdr->log_bytes);
+		*log_pages /= fsi->pagesize;
+
+		if (*log_pages == 0 || *log_pages >= fsi->pages_per_peb) {
+			if (!silent)
+				SSDFS_ERR("invalid log pages %u\n", *log_pages);
+			else
+				SSDFS_DBG("invalid log pages %u\n", *log_pages);
+			return -EIO;
+		}
+	} else {
+		if (!silent) {
+			SSDFS_ERR("log footer is corrupted: "
+				  "peb_id %llu, bytes_off %u\n",
+				  peb_id, bytes_off);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("log footer is corrupted: "
+				  "peb_id %llu, bytes_off %u\n",
+				  peb_id, bytes_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_read_checked_log_footer() - read and check log footer
+ * @fsi: pointer on shared file system object
+ * @log_hdr: log header
+ * @peb_id: PEB identification number
+ * @bytes_off: offset inside PEB in bytes
+ * @buf: buffer for log footer
+ * @silent: show error or not?
+ *
+ * This function reads and checks consistency of log footer.
+ *
+ * RETURN:
+ * [success] - log footer is consistent.
+ * [failure] - error code:
+ *
+ * %-ENODATA     - valid magic doesn't detected.
+ * %-EIO         - log footer is corrupted.
+ */
+int ssdfs_read_checked_log_footer(struct ssdfs_fs_info *fsi, void *log_hdr,
+				  u64 peb_id, u32 bytes_off, void *buf,
+				  bool silent)
+{
+	struct ssdfs_signature *magic;
+	struct ssdfs_log_footer *footer;
+	struct ssdfs_partial_log_header *pl_hdr;
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !fsi->devops->read);
+	BUG_ON(!log_hdr || !buf);
+	BUG_ON(bytes_off >= (fsi->pages_per_peb * fsi->pagesize));
+
+	SSDFS_DBG("peb_id %llu, bytes_off %u, buf %p\n",
+		  peb_id, bytes_off, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_unaligned_read_buffer(fsi, peb_id, bytes_off,
+					  buf, footer_size);
+	if (unlikely(err)) {
+		if (!silent) {
+			SSDFS_ERR("fail to read log footer: "
+				  "peb_id %llu, bytes_off %u, err %d\n",
+				  peb_id, bytes_off, err);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("fail to read log footer: "
+				  "peb_id %llu, bytes_off %u, err %d\n",
+				  peb_id, bytes_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return err;
+	}
+
+	magic = (struct ssdfs_signature *)buf;
+
+	if (!is_ssdfs_magic_valid(magic)) {
+		if (!silent)
+			SSDFS_ERR("valid magic is not detected\n");
+		else
+			SSDFS_DBG("valid magic is not detected\n");
+
+		return -ENODATA;
+	}
+
+	if (__is_ssdfs_log_footer_magic_valid(magic)) {
+		footer = SSDFS_LF(buf);
+
+		err = ssdfs_check_log_footer(fsi, log_hdr, footer, silent);
+		if (err) {
+			if (!silent) {
+				SSDFS_ERR("log footer is corrupted: "
+					  "peb_id %llu, bytes_off %u, err %d\n",
+					  peb_id, bytes_off, err);
+			} else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("log footer is corrupted: "
+					  "peb_id %llu, bytes_off %u, err %d\n",
+					  peb_id, bytes_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+			return err;
+		}
+	} else if (is_ssdfs_partial_log_header_magic_valid(magic)) {
+		pl_hdr = SSDFS_PLH(buf);
+
+		err = ssdfs_check_partial_log_header(fsi, pl_hdr, silent);
+		if (unlikely(err)) {
+			if (!silent) {
+				SSDFS_ERR("partial log header is corrupted: "
+					  "peb_id %llu, bytes_off %u\n",
+					  peb_id, bytes_off);
+			} else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("partial log header is corrupted: "
+					  "peb_id %llu, bytes_off %u\n",
+					  peb_id, bytes_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+
+			return err;
+		}
+	} else {
+		if (!silent) {
+			SSDFS_ERR("log footer is corrupted: "
+				  "peb_id %llu, bytes_off %u\n",
+				  peb_id, bytes_off);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("log footer is corrupted: "
+				  "peb_id %llu, bytes_off %u\n",
+				  peb_id, bytes_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_store_nsegs() - store volume's segments number in volume state
+ * @fsi: pointer on shared file system object
+ * @vs: volume state [out]
+ *
+ * This function stores volume's segments number in volume state.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ENOLCK     - volume is under resize.
+ */
+static inline
+int ssdfs_store_nsegs(struct ssdfs_fs_info *fsi,
+			struct ssdfs_volume_state *vs)
+{
+	mutex_lock(&fsi->resize_mutex);
+	vs->nsegs = cpu_to_le64(fsi->nsegs);
+	mutex_unlock(&fsi->resize_mutex);
+
+	return 0;
+}
+
+/*
+ * ssdfs_prepare_current_segment_ids() - prepare current segment IDs
+ * @fsi: pointer on shared file system object
+ * @array: pointer on array of IDs [out]
+ * @size: size the array in bytes
+ *
+ * This function prepares the current segment IDs.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code.
+ */
+int ssdfs_prepare_current_segment_ids(struct ssdfs_fs_info *fsi,
+					__le64 *array,
+					size_t size)
+{
+	size_t count = size / sizeof(__le64);
+	int i;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !array);
+
+	SSDFS_DBG("fsi %p, array %p, size %zu\n",
+		  fsi, array, size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (size != (sizeof(__le64) * SSDFS_CUR_SEGS_COUNT)) {
+		SSDFS_ERR("invalid array size %zu\n",
+			  size);
+		return -EINVAL;
+	}
+
+	memset(array, 0xFF, size);
+
+	if (fsi->cur_segs) {
+		down_read(&fsi->cur_segs->lock);
+		for (i = 0; i < count; i++) {
+			struct ssdfs_segment_info *real_seg;
+			u64 seg;
+
+			if (!fsi->cur_segs->objects[i])
+				continue;
+
+			ssdfs_current_segment_lock(fsi->cur_segs->objects[i]);
+
+			real_seg = fsi->cur_segs->objects[i]->real_seg;
+			if (real_seg) {
+				seg = real_seg->seg_id;
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("index %d, seg_id %llu\n",
+					  i, seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+				array[i] = cpu_to_le64(seg);
+			} else {
+				seg = fsi->cur_segs->objects[i]->seg_id;
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("index %d, seg_id %llu\n",
+					  i, seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+				array[i] = cpu_to_le64(seg);
+			}
+
+			ssdfs_current_segment_unlock(fsi->cur_segs->objects[i]);
+		}
+		up_read(&fsi->cur_segs->lock);
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_prepare_volume_state_info_for_commit() - prepare volume state
+ * @fsi: pointer on shared file system object
+ * @fs_state: file system state
+ * @array: pointer on array of IDs
+ * @size: size the array in bytes
+ * @last_log_time: log creation timestamp
+ * @last_log_cno: last log checkpoint
+ * @vs: volume state [out]
+ *
+ * This function prepares volume state info for commit.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code.
+ */
+int ssdfs_prepare_volume_state_info_for_commit(struct ssdfs_fs_info *fsi,
+						u16 fs_state,
+						__le64 *cur_segs,
+						size_t size,
+						u64 last_log_time,
+						u64 last_log_cno,
+						struct ssdfs_volume_state *vs)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !vs);
+
+	SSDFS_DBG("fsi %p, fs_state %#x\n", fsi, fs_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (size != (sizeof(__le64) * SSDFS_CUR_SEGS_COUNT)) {
+		SSDFS_ERR("invalid array size %zu\n",
+			  size);
+		return -EINVAL;
+	}
+
+	err = ssdfs_store_nsegs(fsi, vs);
+	if (err) {
+		SSDFS_DBG("unable to store segments number: err %d\n", err);
+		return err;
+	}
+
+	vs->magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC);
+	vs->magic.version.major = SSDFS_MAJOR_REVISION;
+	vs->magic.version.minor = SSDFS_MINOR_REVISION;
+
+	spin_lock(&fsi->volume_state_lock);
+
+	fsi->fs_mod_time = last_log_time;
+	fsi->fs_state = fs_state;
+
+	vs->free_pages = cpu_to_le64(fsi->free_pages);
+	vs->timestamp = cpu_to_le64(last_log_time);
+	vs->cno = cpu_to_le64(last_log_cno);
+	vs->flags = cpu_to_le32(fsi->fs_flags);
+	vs->state = cpu_to_le16(fs_state);
+	vs->errors = cpu_to_le16(fsi->fs_errors);
+	vs->feature_compat = cpu_to_le64(fsi->fs_feature_compat);
+	vs->feature_compat_ro = cpu_to_le64(fsi->fs_feature_compat_ro);
+	vs->feature_incompat = cpu_to_le64(fsi->fs_feature_incompat);
+
+	ssdfs_memcpy(vs->uuid, 0, SSDFS_UUID_SIZE,
+		     fsi->vs->uuid, 0, SSDFS_UUID_SIZE,
+		     SSDFS_UUID_SIZE);
+	ssdfs_memcpy(vs->label, 0, SSDFS_VOLUME_LABEL_MAX,
+		     fsi->vs->label, 0, SSDFS_VOLUME_LABEL_MAX,
+		     SSDFS_VOLUME_LABEL_MAX);
+	ssdfs_memcpy(vs->cur_segs, 0, size,
+		     cur_segs, 0, size,
+		     size);
+
+	vs->migration_threshold = cpu_to_le16(fsi->migration_threshold);
+	vs->open_zones = cpu_to_le32(atomic_read(&fsi->open_zones));
+
+	spin_unlock(&fsi->volume_state_lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("open_zones %d\n",
+		  atomic_read(&fsi->open_zones));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_memcpy(&vs->blkbmap,
+		     0, sizeof(struct ssdfs_blk_bmap_options),
+		     &fsi->vs->blkbmap,
+		     0, sizeof(struct ssdfs_blk_bmap_options),
+		     sizeof(struct ssdfs_blk_bmap_options));
+	ssdfs_memcpy(&vs->blk2off_tbl,
+		     0, sizeof(struct ssdfs_blk2off_tbl_options),
+		     &fsi->vs->blk2off_tbl,
+		     0, sizeof(struct ssdfs_blk2off_tbl_options),
+		     sizeof(struct ssdfs_blk2off_tbl_options));
+
+	ssdfs_memcpy(&vs->user_data,
+		     0, sizeof(struct ssdfs_user_data_options),
+		     &fsi->vs->user_data,
+		     0, sizeof(struct ssdfs_user_data_options),
+		     sizeof(struct ssdfs_user_data_options));
+	ssdfs_memcpy(&vs->root_folder,
+		     0, sizeof(struct ssdfs_inode),
+		     &fsi->vs->root_folder,
+		     0, sizeof(struct ssdfs_inode),
+		     sizeof(struct ssdfs_inode));
+
+	ssdfs_memcpy(&vs->inodes_btree,
+		     0, sizeof(struct ssdfs_inodes_btree),
+		     &fsi->vs->inodes_btree,
+		     0, sizeof(struct ssdfs_inodes_btree),
+		     sizeof(struct ssdfs_inodes_btree));
+	ssdfs_memcpy(&vs->shared_extents_btree,
+		     0, sizeof(struct ssdfs_shared_extents_btree),
+		     &fsi->vs->shared_extents_btree,
+		     0, sizeof(struct ssdfs_shared_extents_btree),
+		     sizeof(struct ssdfs_shared_extents_btree));
+	ssdfs_memcpy(&vs->shared_dict_btree,
+		     0, sizeof(struct ssdfs_shared_dictionary_btree),
+		     &fsi->vs->shared_dict_btree,
+		     0, sizeof(struct ssdfs_shared_dictionary_btree),
+		     sizeof(struct ssdfs_shared_dictionary_btree));
+	ssdfs_memcpy(&vs->snapshots_btree,
+		     0, sizeof(struct ssdfs_snapshots_btree),
+		     &fsi->vs->snapshots_btree,
+		     0, sizeof(struct ssdfs_snapshots_btree),
+		     sizeof(struct ssdfs_snapshots_btree));
+
+	return 0;
+}
+
+/*
+ * ssdfs_prepare_log_footer_for_commit() - prepare log footer for commit
+ * @fsi: pointer on shared file system object
+ * @log_pages: count of pages in the log
+ * @log_flags: log's flags
+ * @last_log_time: log creation timestamp
+ * @last_log_cno: last log checkpoint
+ * @footer: log footer [out]
+ *
+ * This function prepares log footer for commit.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input values.
+ */
+int ssdfs_prepare_log_footer_for_commit(struct ssdfs_fs_info *fsi,
+					u32 log_pages,
+					u32 log_flags,
+					u64 last_log_time,
+					u64 last_log_cno,
+					struct ssdfs_log_footer *footer)
+{
+	u16 data_size = sizeof(struct ssdfs_log_footer);
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("fsi %p, log_pages %u, log_flags %#x, footer %p\n",
+		  fsi, log_pages, log_flags, footer);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	footer->volume_state.magic.key = cpu_to_le16(SSDFS_LOG_FOOTER_MAGIC);
+
+	footer->timestamp = cpu_to_le64(last_log_time);
+	footer->cno = cpu_to_le64(last_log_cno);
+
+	if (log_pages >= (U32_MAX >> fsi->log_pagesize)) {
+		SSDFS_ERR("invalid value of log_pages %u\n", log_pages);
+		return -EINVAL;
+	}
+
+	footer->log_bytes = cpu_to_le32(log_pages << fsi->log_pagesize);
+
+	if (log_flags & ~SSDFS_LOG_FOOTER_FLAG_MASK) {
+		SSDFS_ERR("unknow log flags %#x\n", log_flags);
+		return -EINVAL;
+	}
+
+	footer->log_flags = cpu_to_le32(log_flags);
+
+	footer->volume_state.check.bytes = cpu_to_le16(data_size);
+	footer->volume_state.check.flags = cpu_to_le16(SSDFS_CRC32);
+
+	err = ssdfs_calculate_csum(&footer->volume_state.check,
+				   footer, data_size);
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to calculate checksum: err %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
diff --git a/fs/ssdfs/volume_header.c b/fs/ssdfs/volume_header.c
new file mode 100644
index 000000000000..e992c3cdf335
--- /dev/null
+++ b/fs/ssdfs/volume_header.c
@@ -0,0 +1,1256 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/volume_header.c - operations with volume header.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+#include <linux/rwsem.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+
+#include <trace/events/ssdfs.h>
+
+/*
+ * __is_ssdfs_segment_header_magic_valid() - check segment header's magic
+ * @magic: pointer on magic value
+ */
+bool __is_ssdfs_segment_header_magic_valid(struct ssdfs_signature *magic)
+{
+	return le16_to_cpu(magic->key) == SSDFS_SEGMENT_HDR_MAGIC;
+}
+
+/*
+ * is_ssdfs_segment_header_magic_valid() - check segment header's magic
+ * @hdr: segment header
+ */
+bool is_ssdfs_segment_header_magic_valid(struct ssdfs_segment_header *hdr)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!hdr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return __is_ssdfs_segment_header_magic_valid(&hdr->volume_hdr.magic);
+}
+
+/*
+ * is_ssdfs_partial_log_header_magic_valid() - check partial log header's magic
+ * @magic: pointer on magic value
+ */
+bool is_ssdfs_partial_log_header_magic_valid(struct ssdfs_signature *magic)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!magic);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return le16_to_cpu(magic->key) == SSDFS_PARTIAL_LOG_HDR_MAGIC;
+}
+
+/*
+ * is_ssdfs_volume_header_csum_valid() - check volume header checksum
+ * @vh_buf: volume header buffer
+ * @buf_size: size of buffer in bytes
+ */
+bool is_ssdfs_volume_header_csum_valid(void *vh_buf, size_t buf_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return is_csum_valid(&SSDFS_VH(vh_buf)->check, vh_buf, buf_size);
+}
+
+/*
+ * is_ssdfs_partial_log_header_csum_valid() - check partial log header checksum
+ * @plh_buf: partial log header buffer
+ * @buf_size: size of buffer in bytes
+ */
+bool is_ssdfs_partial_log_header_csum_valid(void *plh_buf, size_t buf_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!plh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return is_csum_valid(&SSDFS_PLH(plh_buf)->check, plh_buf, buf_size);
+}
+
+static inline
+void ssdfs_show_volume_header(struct ssdfs_volume_header *hdr)
+{
+	SSDFS_ERR("MAGIC: common %#x, key %#x, "
+		  "version (major %u, minor %u)\n",
+		  le32_to_cpu(hdr->magic.common),
+		  le16_to_cpu(hdr->magic.key),
+		  hdr->magic.version.major,
+		  hdr->magic.version.minor);
+	SSDFS_ERR("CHECK: bytes %u, flags %#x, csum %#x\n",
+		  le16_to_cpu(hdr->check.bytes),
+		  le16_to_cpu(hdr->check.flags),
+		  le32_to_cpu(hdr->check.csum));
+	SSDFS_ERR("KEY VALUES: log_pagesize %u, log_erasesize %u, "
+		  "log_segsize %u, log_pebs_per_seg %u, "
+		  "megabytes_per_peb %u, pebs_per_seg %u, "
+		  "create_time %llu, create_cno %llu, flags %#x\n",
+		  hdr->log_pagesize,
+		  hdr->log_erasesize,
+		  hdr->log_segsize,
+		  hdr->log_pebs_per_seg,
+		  le16_to_cpu(hdr->megabytes_per_peb),
+		  le16_to_cpu(hdr->pebs_per_seg),
+		  le64_to_cpu(hdr->create_time),
+		  le64_to_cpu(hdr->create_cno),
+		  le32_to_cpu(hdr->flags));
+}
+
+/*
+ * is_ssdfs_volume_header_consistent() - check volume header consistency
+ * @fsi: pointer on shared file system object
+ * @vh: volume header
+ * @dev_size: partition size in bytes
+ *
+ * RETURN:
+ * [true]  - volume header is consistent.
+ * [false] - volume header is corrupted.
+ */
+bool is_ssdfs_volume_header_consistent(struct ssdfs_fs_info *fsi,
+					struct ssdfs_volume_header *vh,
+					u64 dev_size)
+{
+	u32 page_size;
+	u64 erase_size;
+	u32 seg_size;
+	u32 pebs_per_seg;
+	u64 leb_array[SSDFS_SB_CHAIN_MAX * SSDFS_SB_SEG_COPY_MAX] = {0};
+	u64 peb_array[SSDFS_SB_CHAIN_MAX * SSDFS_SB_SEG_COPY_MAX] = {0};
+	int array_index = 0;
+	int i, j, k;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!vh);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	page_size = 1 << vh->log_pagesize;
+	erase_size = 1 << vh->log_erasesize;
+	seg_size = 1 << vh->log_segsize;
+	pebs_per_seg = 1 << vh->log_pebs_per_seg;
+
+	if (page_size >= erase_size) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page_size %u >= erase_size %llu\n",
+			  page_size, erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	switch (page_size) {
+	case SSDFS_4KB:
+	case SSDFS_8KB:
+	case SSDFS_16KB:
+	case SSDFS_32KB:
+		/* do nothing */
+		break;
+
+	default:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unexpected page_size %u\n", page_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	switch (erase_size) {
+	case SSDFS_128KB:
+	case SSDFS_256KB:
+	case SSDFS_512KB:
+	case SSDFS_2MB:
+	case SSDFS_8MB:
+	case SSDFS_16MB:
+	case SSDFS_32MB:
+	case SSDFS_64MB:
+	case SSDFS_128MB:
+	case SSDFS_256MB:
+	case SSDFS_512MB:
+	case SSDFS_1GB:
+	case SSDFS_2GB:
+	case SSDFS_8GB:
+	case SSDFS_16GB:
+	case SSDFS_32GB:
+	case SSDFS_64GB:
+		/* do nothing */
+		break;
+
+	default:
+		if (fsi->is_zns_device) {
+			u64 zone_size = le16_to_cpu(vh->megabytes_per_peb);
+
+			zone_size *= SSDFS_1MB;
+
+			if (fsi->zone_size != zone_size) {
+				SSDFS_ERR("invalid zone size: "
+					  "size1 %llu != size2 %llu\n",
+					  fsi->zone_size, zone_size);
+				return -ERANGE;
+			}
+
+			erase_size = zone_size;
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unexpected erase_size %llu\n", erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return false;
+		}
+	};
+
+	if (seg_size < erase_size) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("seg_size %u < erase_size %llu\n",
+			  seg_size, erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	if (pebs_per_seg != (seg_size >> vh->log_erasesize)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("pebs_per_seg %u != (seg_size %u / erase_size %llu)\n",
+			  pebs_per_seg, seg_size, erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	if (seg_size >= dev_size) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("seg_size %u >= dev_size %llu\n",
+			  seg_size, dev_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	for (i = 0; i < SSDFS_SB_CHAIN_MAX; i++) {
+		for (j = 0; j < SSDFS_SB_SEG_COPY_MAX; j++) {
+			u64 leb_id = le64_to_cpu(vh->sb_pebs[i][j].leb_id);
+			u64 peb_id = le64_to_cpu(vh->sb_pebs[i][j].peb_id);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("i %d, j %d, LEB %llu, PEB %llu\n",
+				  i, j, leb_id, peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			for (k = 0; k < array_index; k++) {
+				if (leb_id == leb_array[k]) {
+#ifdef CONFIG_SSDFS_DEBUG
+					SSDFS_DBG("corrupted LEB number: "
+						  "leb_id %llu, "
+						  "leb_array[%d] %llu\n",
+						  leb_id, k,
+						  leb_array[k]);
+#endif /* CONFIG_SSDFS_DEBUG */
+					return false;
+				}
+
+				if (peb_id == peb_array[k]) {
+#ifdef CONFIG_SSDFS_DEBUG
+					SSDFS_DBG("corrupted PEB number: "
+						  "peb_id %llu, "
+						  "peb_array[%d] %llu\n",
+						  peb_id, k,
+						  peb_array[k]);
+#endif /* CONFIG_SSDFS_DEBUG */
+					return false;
+				}
+			}
+
+			if (i == SSDFS_PREV_SB_SEG &&
+			    leb_id == U64_MAX && peb_id == U64_MAX) {
+				/* prev id is U64_MAX after volume creation */
+				continue;
+			}
+
+			if (i == SSDFS_RESERVED_SB_SEG &&
+			    leb_id == U64_MAX && peb_id == U64_MAX) {
+				/*
+				 * The reserved seg could be U64_MAX
+				 * if there is no clean segment.
+				 */
+				continue;
+			}
+
+			if (leb_id >= (dev_size >> vh->log_erasesize)) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("corrupted LEB number %llu\n",
+					  leb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+				return false;
+			}
+
+			leb_array[array_index] = leb_id;
+			peb_array[array_index] = peb_id;
+
+			array_index++;
+		}
+	}
+
+	return true;
+}
+
+/*
+ * ssdfs_check_segment_header() - check segment header consistency
+ * @fsi: pointer on shared file system object
+ * @hdr: segment header
+ * @silent: show error or not?
+ *
+ * This function checks consistency of segment header.
+ *
+ * RETURN:
+ * [success] - segment header is consistent.
+ * [failure] - error code:
+ *
+ * %-ENODATA     - valid magic doesn't detected.
+ * %-EIO         - segment header is corrupted.
+ */
+int ssdfs_check_segment_header(struct ssdfs_fs_info *fsi,
+				struct ssdfs_segment_header *hdr,
+				bool silent)
+{
+	struct ssdfs_volume_header *vh;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	bool major_magic_valid, minor_magic_valid;
+	u64 dev_size;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !hdr);
+
+	SSDFS_DBG("fsi %p, hdr %p, silent %#x\n", fsi, hdr, silent);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	vh = SSDFS_VH(hdr);
+
+	major_magic_valid = is_ssdfs_magic_valid(&vh->magic);
+	minor_magic_valid = is_ssdfs_segment_header_magic_valid(hdr);
+
+	if (!major_magic_valid && !minor_magic_valid) {
+		if (!silent) {
+			SSDFS_ERR("valid magic doesn't detected\n");
+			ssdfs_show_volume_header(vh);
+		} else
+			SSDFS_DBG("valid magic doesn't detected\n");
+		return -ENODATA;
+	} else if (!major_magic_valid) {
+		if (!silent) {
+			SSDFS_ERR("invalid SSDFS magic signature\n");
+			ssdfs_show_volume_header(vh);
+		} else
+			SSDFS_DBG("invalid SSDFS magic signature\n");
+		return -EIO;
+	} else if (!minor_magic_valid) {
+		if (!silent) {
+			SSDFS_ERR("invalid segment header magic signature\n");
+			ssdfs_show_volume_header(vh);
+		} else
+			SSDFS_DBG("invalid segment header magic signature\n");
+		return -EIO;
+	}
+
+	if (!is_ssdfs_volume_header_csum_valid(hdr, hdr_size)) {
+		if (!silent) {
+			SSDFS_ERR("invalid checksum of volume header\n");
+			ssdfs_show_volume_header(vh);
+		} else
+			SSDFS_DBG("invalid checksum of volume header\n");
+		return -EIO;
+	}
+
+	dev_size = fsi->devops->device_size(fsi->sb);
+	if (!is_ssdfs_volume_header_consistent(fsi, vh, dev_size)) {
+		if (!silent) {
+			SSDFS_ERR("volume header is corrupted\n");
+			ssdfs_show_volume_header(vh);
+		} else
+			SSDFS_DBG("volume header is corrupted\n");
+		return -EIO;
+	}
+
+	if (SSDFS_VH_CNO(vh) > SSDFS_SEG_CNO(hdr)) {
+		if (!silent) {
+			SSDFS_ERR("invalid checkpoint/timestamp\n");
+			ssdfs_show_volume_header(vh);
+		} else
+			SSDFS_DBG("invalid checkpoint/timestamp\n");
+		return -EIO;
+	}
+
+	if (le16_to_cpu(hdr->log_pages) > fsi->pages_per_peb) {
+		if (!silent) {
+			SSDFS_ERR("log_pages %u > pages_per_peb %u\n",
+				  le16_to_cpu(hdr->log_pages),
+				  fsi->pages_per_peb);
+			ssdfs_show_volume_header(vh);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("log_pages %u > pages_per_peb %u\n",
+				  le16_to_cpu(hdr->log_pages),
+				  fsi->pages_per_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (le16_to_cpu(hdr->seg_type) > SSDFS_LAST_KNOWN_SEG_TYPE) {
+		if (!silent) {
+			SSDFS_ERR("unknown seg_type %#x\n",
+				  le16_to_cpu(hdr->seg_type));
+			ssdfs_show_volume_header(vh);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unknown seg_type %#x\n",
+				  le16_to_cpu(hdr->seg_type));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (le32_to_cpu(hdr->seg_flags) & ~SSDFS_SEG_HDR_FLAG_MASK) {
+		if (!silent) {
+			SSDFS_ERR("corrupted seg_flags %#x\n",
+				  le32_to_cpu(hdr->seg_flags));
+			ssdfs_show_volume_header(vh);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("corrupted seg_flags %#x\n",
+				  le32_to_cpu(hdr->seg_flags));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * is_ssdfs_partial_log_header_consistent() - check partial header consistency
+ * @fsi: pointer on shared file system object
+ * @ph: partial log header
+ * @dev_size: partition size in bytes
+ *
+ * RETURN:
+ * [true]  - partial log header is consistent.
+ * [false] - partial log header is corrupted.
+ */
+bool is_ssdfs_partial_log_header_consistent(struct ssdfs_fs_info *fsi,
+					    struct ssdfs_partial_log_header *ph,
+					    u64 dev_size)
+{
+	u32 page_size;
+	u64 erase_size;
+	u32 seg_size;
+	u32 pebs_per_seg;
+	u64 nsegs;
+	u64 free_pages;
+	u64 pages_count;
+	u32 remainder;
+	u32 pages_per_seg;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!ph);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	page_size = 1 << ph->log_pagesize;
+	erase_size = 1 << ph->log_erasesize;
+	seg_size = 1 << ph->log_segsize;
+	pebs_per_seg = 1 << ph->log_pebs_per_seg;
+
+	if (page_size >= erase_size) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page_size %u >= erase_size %llu\n",
+			  page_size, erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	switch (page_size) {
+	case SSDFS_4KB:
+	case SSDFS_8KB:
+	case SSDFS_16KB:
+	case SSDFS_32KB:
+		/* do nothing */
+		break;
+
+	default:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unexpected page_size %u\n", page_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	switch (erase_size) {
+	case SSDFS_128KB:
+	case SSDFS_256KB:
+	case SSDFS_512KB:
+	case SSDFS_2MB:
+	case SSDFS_8MB:
+	case SSDFS_16MB:
+	case SSDFS_32MB:
+	case SSDFS_64MB:
+	case SSDFS_128MB:
+	case SSDFS_256MB:
+	case SSDFS_512MB:
+	case SSDFS_1GB:
+	case SSDFS_2GB:
+	case SSDFS_8GB:
+	case SSDFS_16GB:
+	case SSDFS_32GB:
+	case SSDFS_64GB:
+		/* do nothing */
+		break;
+
+	default:
+		if (fsi->is_zns_device) {
+			u64 zone_size = le16_to_cpu(fsi->vh->megabytes_per_peb);
+
+			zone_size *= SSDFS_1MB;
+
+			if (fsi->zone_size != zone_size) {
+				SSDFS_ERR("invalid zone size: "
+					  "size1 %llu != size2 %llu\n",
+					  fsi->zone_size, zone_size);
+				return -ERANGE;
+			}
+
+			erase_size = (u32)zone_size;
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unexpected erase_size %llu\n", erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return false;
+		}
+	};
+
+	if (seg_size < erase_size) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("seg_size %u < erase_size %llu\n",
+			  seg_size, erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	if (pebs_per_seg != (seg_size >> ph->log_erasesize)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("pebs_per_seg %u != (seg_size %u / erase_size %llu)\n",
+			  pebs_per_seg, seg_size, erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	if (seg_size >= dev_size) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("seg_size %u >= dev_size %llu\n",
+			  seg_size, dev_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	nsegs = le64_to_cpu(ph->nsegs);
+
+	if (nsegs == 0 || nsegs > (dev_size >> ph->log_segsize)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("invalid nsegs %llu, dev_size %llu, seg_size) %u\n",
+			  nsegs, dev_size, seg_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	free_pages = le64_to_cpu(ph->free_pages);
+
+	pages_count = div_u64_rem(dev_size, page_size, &remainder);
+	if (remainder) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("dev_size %llu is unaligned on page_size %u\n",
+			  dev_size, page_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+	if (free_pages > pages_count) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("free_pages %llu is greater than pages_count %llu\n",
+			  free_pages, pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	pages_per_seg = seg_size / page_size;
+	if (nsegs <= div_u64(free_pages, pages_per_seg)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("invalid nsegs %llu, free_pages %llu, "
+			  "pages_per_seg %u\n",
+			  nsegs, free_pages, pages_per_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * ssdfs_check_partial_log_header() - check partial log header consistency
+ * @fsi: pointer on shared file system object
+ * @hdr: partial log header
+ * @silent: show error or not?
+ *
+ * This function checks consistency of partial log header.
+ *
+ * RETURN:
+ * [success] - partial log header is consistent.
+ * [failure] - error code:
+ *
+ * %-ENODATA     - valid magic doesn't detected.
+ * %-EIO         - partial log header is corrupted.
+ */
+int ssdfs_check_partial_log_header(struct ssdfs_fs_info *fsi,
+				   struct ssdfs_partial_log_header *hdr,
+				   bool silent)
+{
+	size_t hdr_size = sizeof(struct ssdfs_partial_log_header);
+	bool major_magic_valid, minor_magic_valid;
+	u64 dev_size;
+	u32 log_bytes;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !hdr);
+
+	SSDFS_DBG("fsi %p, hdr %p, silent %#x\n", fsi, hdr, silent);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	major_magic_valid = is_ssdfs_magic_valid(&hdr->magic);
+	minor_magic_valid =
+		is_ssdfs_partial_log_header_magic_valid(&hdr->magic);
+
+	if (!major_magic_valid && !minor_magic_valid) {
+		if (!silent)
+			SSDFS_ERR("valid magic doesn't detected\n");
+		else
+			SSDFS_DBG("valid magic doesn't detected\n");
+		return -ENODATA;
+	} else if (!major_magic_valid) {
+		if (!silent)
+			SSDFS_ERR("invalid SSDFS magic signature\n");
+		else
+			SSDFS_DBG("invalid SSDFS magic signature\n");
+		return -EIO;
+	} else if (!minor_magic_valid) {
+		if (!silent)
+			SSDFS_ERR("invalid partial log header magic\n");
+		else
+			SSDFS_DBG("invalid partial log header magic\n");
+		return -EIO;
+	}
+
+	if (!is_ssdfs_partial_log_header_csum_valid(hdr, hdr_size)) {
+		if (!silent)
+			SSDFS_ERR("invalid checksum of partial log header\n");
+		else
+			SSDFS_DBG("invalid checksum of partial log header\n");
+		return -EIO;
+	}
+
+	dev_size = fsi->devops->device_size(fsi->sb);
+	if (!is_ssdfs_partial_log_header_consistent(fsi, hdr, dev_size)) {
+		if (!silent)
+			SSDFS_ERR("partial log header is corrupted\n");
+		else
+			SSDFS_DBG("partial log header is corrupted\n");
+		return -EIO;
+	}
+
+	if (le16_to_cpu(hdr->log_pages) > fsi->pages_per_peb) {
+		if (!silent) {
+			SSDFS_ERR("log_pages %u > pages_per_peb %u\n",
+				  le16_to_cpu(hdr->log_pages),
+				  fsi->pages_per_peb);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("log_pages %u > pages_per_peb %u\n",
+				  le16_to_cpu(hdr->log_pages),
+				  fsi->pages_per_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	log_bytes = (u32)le16_to_cpu(hdr->log_pages) * fsi->pagesize;
+	if (le32_to_cpu(hdr->log_bytes) > log_bytes) {
+		if (!silent) {
+			SSDFS_ERR("calculated log_bytes %u < log_bytes %u\n",
+				  log_bytes,
+				  le32_to_cpu(hdr->log_bytes));
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("calculated log_bytes %u < log_bytes %u\n",
+				  log_bytes,
+				  le32_to_cpu(hdr->log_bytes));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (le16_to_cpu(hdr->seg_type) > SSDFS_LAST_KNOWN_SEG_TYPE) {
+		if (!silent) {
+			SSDFS_ERR("unknown seg_type %#x\n",
+				  le16_to_cpu(hdr->seg_type));
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unknown seg_type %#x\n",
+				  le16_to_cpu(hdr->seg_type));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (le32_to_cpu(hdr->pl_flags) & ~SSDFS_SEG_HDR_FLAG_MASK) {
+		if (!silent) {
+			SSDFS_ERR("corrupted pl_flags %#x\n",
+				  le32_to_cpu(hdr->pl_flags));
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("corrupted pl_flags %#x\n",
+				  le32_to_cpu(hdr->pl_flags));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_read_checked_segment_header() - read and check segment header
+ * @fsi: pointer on shared file system object
+ * @peb_id: PEB identification number
+ * @pages_off: offset from PEB's begin in pages
+ * @buf: buffer
+ * @silent: show error or not?
+ *
+ * This function reads and checks consistency of segment header.
+ *
+ * RETURN:
+ * [success] - segment header is consistent.
+ * [failure] - error code:
+ *
+ * %-ENODATA     - valid magic doesn't detected.
+ * %-EIO         - segment header is corrupted.
+ */
+int ssdfs_read_checked_segment_header(struct ssdfs_fs_info *fsi,
+					u64 peb_id, u32 pages_off,
+					void *buf, bool silent)
+{
+	struct ssdfs_signature *magic;
+	struct ssdfs_segment_header *hdr;
+	struct ssdfs_partial_log_header *pl_hdr;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	u64 offset = 0;
+	size_t read_bytes;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("peb_id %llu, pages_off %u, buf %p, silent %#x\n",
+		  peb_id, pages_off, buf, silent);
+
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->devops->read);
+	BUG_ON(!buf);
+	BUG_ON(pages_off >= fsi->pages_per_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (peb_id == 0 && pages_off == 0)
+		offset = SSDFS_RESERVED_VBR_SIZE;
+	else
+		offset = (u64)pages_off * fsi->pagesize;
+
+	err = ssdfs_aligned_read_buffer(fsi, peb_id, offset,
+					buf, hdr_size,
+					&read_bytes);
+	if (unlikely(err)) {
+		if (!silent) {
+			SSDFS_ERR("fail to read segment header: "
+				  "peb_id %llu, pages_off %u, err %d\n",
+				  peb_id, pages_off, err);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("fail to read segment header: "
+				  "peb_id %llu, pages_off %u, err %d\n",
+				  peb_id, pages_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return err;
+	}
+
+	if (unlikely(read_bytes != hdr_size)) {
+		if (!silent) {
+			SSDFS_ERR("fail to read segment header: "
+				  "peb_id %llu, pages_off %u: "
+				  "read_bytes %zu != hdr_size %zu\n",
+				  peb_id, pages_off, read_bytes, hdr_size);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("fail to read segment header: "
+				  "peb_id %llu, pages_off %u: "
+				  "read_bytes %zu != hdr_size %zu\n",
+				  peb_id, pages_off, read_bytes, hdr_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -ERANGE;
+	}
+
+	magic = (struct ssdfs_signature *)buf;
+
+	if (!is_ssdfs_magic_valid(magic)) {
+		if (!silent)
+			SSDFS_ERR("valid magic is not detected\n");
+		else
+			SSDFS_DBG("valid magic is not detected\n");
+
+		return -ENODATA;
+	}
+
+	if (__is_ssdfs_segment_header_magic_valid(magic)) {
+		hdr = SSDFS_SEG_HDR(buf);
+
+		err = ssdfs_check_segment_header(fsi, hdr, silent);
+		if (unlikely(err)) {
+			if (!silent) {
+				SSDFS_ERR("segment header is corrupted: "
+					  "peb_id %llu, pages_off %u, err %d\n",
+					  peb_id, pages_off, err);
+			} else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("segment header is corrupted: "
+					  "peb_id %llu, pages_off %u, err %d\n",
+					  peb_id, pages_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+
+			return err;
+		}
+	} else if (is_ssdfs_partial_log_header_magic_valid(magic)) {
+		pl_hdr = SSDFS_PLH(buf);
+
+		err = ssdfs_check_partial_log_header(fsi, pl_hdr, silent);
+		if (unlikely(err)) {
+			if (!silent) {
+				SSDFS_ERR("partial log header is corrupted: "
+					  "peb_id %llu, pages_off %u\n",
+					  peb_id, pages_off);
+			} else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("partial log header is corrupted: "
+					  "peb_id %llu, pages_off %u\n",
+					  peb_id, pages_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+
+			return err;
+		}
+	} else {
+		if (!silent) {
+			SSDFS_ERR("log header is corrupted: "
+				  "peb_id %llu, pages_off %u\n",
+				  peb_id, pages_off);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("log header is corrupted: "
+				  "peb_id %llu, pages_off %u\n",
+				  peb_id, pages_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_create_volume_header() - initialize volume header from the scratch
+ * @fsi: pointer on shared file system object
+ * @vh: volume header
+ */
+void ssdfs_create_volume_header(struct ssdfs_fs_info *fsi,
+				struct ssdfs_volume_header *vh)
+{
+	u64 erase_size;
+	u32 megabytes_per_peb;
+	u32 flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !vh);
+
+	SSDFS_DBG("fsi %p, vh %p\n", fsi, vh);
+	SSDFS_DBG("fsi->log_pagesize %u, fsi->log_erasesize %u, "
+		  "fsi->log_segsize %u, fsi->log_pebs_per_seg %u\n",
+		  fsi->log_pagesize,
+		  fsi->log_erasesize,
+		  fsi->log_segsize,
+		  fsi->log_pebs_per_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	vh->magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC);
+	vh->magic.key = cpu_to_le16(SSDFS_SEGMENT_HDR_MAGIC);
+	vh->magic.version.major = SSDFS_MAJOR_REVISION;
+	vh->magic.version.minor = SSDFS_MINOR_REVISION;
+
+	vh->log_pagesize = fsi->log_pagesize;
+	vh->log_erasesize = fsi->log_erasesize;
+	vh->log_segsize = fsi->log_segsize;
+	vh->log_pebs_per_seg = fsi->log_pebs_per_seg;
+
+	megabytes_per_peb = fsi->erasesize / SSDFS_1MB;
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(megabytes_per_peb >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+	vh->megabytes_per_peb = cpu_to_le16((u16)megabytes_per_peb);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(fsi->pebs_per_seg >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+	vh->pebs_per_seg = cpu_to_le16((u16)fsi->pebs_per_seg);
+
+	vh->create_time = cpu_to_le64(fsi->fs_ctime);
+	vh->create_cno = cpu_to_le64(fsi->fs_cno);
+
+	vh->lebs_per_peb_index = cpu_to_le32(fsi->lebs_per_peb_index);
+	vh->create_threads_per_seg = cpu_to_le16(fsi->create_threads_per_seg);
+
+	vh->flags = cpu_to_le32(0);
+
+	if (fsi->is_zns_device) {
+		flags = le32_to_cpu(vh->flags);
+		flags |= SSDFS_VH_ZNS_BASED_VOLUME;
+
+		erase_size = 1 << fsi->log_erasesize;
+		if (erase_size != fsi->zone_size)
+			flags |= SSDFS_VH_UNALIGNED_ZONE;
+
+		vh->flags = cpu_to_le32(flags);
+	}
+
+	vh->sb_seg_log_pages = cpu_to_le16(fsi->sb_seg_log_pages);
+	vh->segbmap_log_pages = cpu_to_le16(fsi->segbmap_log_pages);
+	vh->maptbl_log_pages = cpu_to_le16(fsi->maptbl_log_pages);
+	vh->lnodes_seg_log_pages = cpu_to_le16(fsi->lnodes_seg_log_pages);
+	vh->hnodes_seg_log_pages = cpu_to_le16(fsi->hnodes_seg_log_pages);
+	vh->inodes_seg_log_pages = cpu_to_le16(fsi->inodes_seg_log_pages);
+	vh->user_data_log_pages = cpu_to_le16(fsi->user_data_log_pages);
+
+	ssdfs_memcpy(&vh->segbmap,
+		     0, sizeof(struct ssdfs_segbmap_sb_header),
+		     &fsi->vh->segbmap,
+		     0, sizeof(struct ssdfs_segbmap_sb_header),
+		     sizeof(struct ssdfs_segbmap_sb_header));
+	ssdfs_memcpy(&vh->maptbl,
+		     0, sizeof(struct ssdfs_maptbl_sb_header),
+		     &fsi->vh->maptbl,
+		     0, sizeof(struct ssdfs_maptbl_sb_header),
+		     sizeof(struct ssdfs_maptbl_sb_header));
+	ssdfs_memcpy(&vh->dentries_btree,
+		     0, sizeof(struct ssdfs_dentries_btree_descriptor),
+		     &fsi->vh->dentries_btree,
+		     0, sizeof(struct ssdfs_dentries_btree_descriptor),
+		     sizeof(struct ssdfs_dentries_btree_descriptor));
+	ssdfs_memcpy(&vh->extents_btree,
+		     0, sizeof(struct ssdfs_extents_btree_descriptor),
+		     &fsi->vh->extents_btree,
+		     0, sizeof(struct ssdfs_extents_btree_descriptor),
+		     sizeof(struct ssdfs_extents_btree_descriptor));
+	ssdfs_memcpy(&vh->xattr_btree,
+		     0, sizeof(struct ssdfs_xattr_btree_descriptor),
+		     &fsi->vh->xattr_btree,
+		     0, sizeof(struct ssdfs_xattr_btree_descriptor),
+		     sizeof(struct ssdfs_xattr_btree_descriptor));
+	ssdfs_memcpy(&vh->invextree,
+		     0, sizeof(struct ssdfs_invalidated_extents_btree),
+		     &fsi->vh->invextree,
+		     0, sizeof(struct ssdfs_invalidated_extents_btree),
+		     sizeof(struct ssdfs_invalidated_extents_btree));
+}
+
+/*
+ * ssdfs_store_sb_segs_array() - store sb segments array
+ * @fsi: pointer on shared file system object
+ * @vh: volume header
+ */
+static inline
+void ssdfs_store_sb_segs_array(struct ssdfs_fs_info *fsi,
+				struct ssdfs_volume_header *vh)
+{
+	int i, j;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("fsi %p, vh %p\n", fsi, vh);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	down_read(&fsi->sb_segs_sem);
+
+	for (i = SSDFS_CUR_SB_SEG; i < SSDFS_SB_CHAIN_MAX; i++) {
+		for (j = SSDFS_MAIN_SB_SEG; j < SSDFS_SB_SEG_COPY_MAX; j++) {
+			vh->sb_pebs[i][j].leb_id =
+				cpu_to_le64(fsi->sb_lebs[i][j]);
+			vh->sb_pebs[i][j].peb_id =
+				cpu_to_le64(fsi->sb_pebs[i][j]);
+		}
+	}
+
+	up_read(&fsi->sb_segs_sem);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb_lebs[CUR][MAIN] %llu, sb_pebs[CUR][MAIN] %llu\n",
+		  fsi->sb_lebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG],
+		  fsi->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG]);
+	SSDFS_DBG("sb_lebs[CUR][COPY] %llu, sb_pebs[CUR][COPY] %llu\n",
+		  fsi->sb_lebs[SSDFS_CUR_SB_SEG][SSDFS_COPY_SB_SEG],
+		  fsi->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_COPY_SB_SEG]);
+	SSDFS_DBG("sb_lebs[NEXT][MAIN] %llu, sb_pebs[NEXT][MAIN] %llu\n",
+		  fsi->sb_lebs[SSDFS_NEXT_SB_SEG][SSDFS_MAIN_SB_SEG],
+		  fsi->sb_pebs[SSDFS_NEXT_SB_SEG][SSDFS_MAIN_SB_SEG]);
+	SSDFS_DBG("sb_lebs[NEXT][COPY] %llu, sb_pebs[NEXT][COPY] %llu\n",
+		  fsi->sb_lebs[SSDFS_NEXT_SB_SEG][SSDFS_COPY_SB_SEG],
+		  fsi->sb_pebs[SSDFS_NEXT_SB_SEG][SSDFS_COPY_SB_SEG]);
+	SSDFS_DBG("sb_lebs[RESERVED][MAIN] %llu, sb_pebs[RESERVED][MAIN] %llu\n",
+		  fsi->sb_lebs[SSDFS_RESERVED_SB_SEG][SSDFS_MAIN_SB_SEG],
+		  fsi->sb_pebs[SSDFS_RESERVED_SB_SEG][SSDFS_MAIN_SB_SEG]);
+	SSDFS_DBG("sb_lebs[RESERVED][COPY] %llu, sb_pebs[RESERVED][COPY] %llu\n",
+		  fsi->sb_lebs[SSDFS_RESERVED_SB_SEG][SSDFS_COPY_SB_SEG],
+		  fsi->sb_pebs[SSDFS_RESERVED_SB_SEG][SSDFS_COPY_SB_SEG]);
+	SSDFS_DBG("sb_lebs[PREV][MAIN] %llu, sb_pebs[PREV][MAIN] %llu\n",
+		  fsi->sb_lebs[SSDFS_PREV_SB_SEG][SSDFS_MAIN_SB_SEG],
+		  fsi->sb_pebs[SSDFS_PREV_SB_SEG][SSDFS_MAIN_SB_SEG]);
+	SSDFS_DBG("sb_lebs[PREV][COPY] %llu, sb_pebs[PREV][COPY] %llu\n",
+		  fsi->sb_lebs[SSDFS_PREV_SB_SEG][SSDFS_COPY_SB_SEG],
+		  fsi->sb_pebs[SSDFS_PREV_SB_SEG][SSDFS_COPY_SB_SEG]);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+/*
+ * ssdfs_prepare_volume_header_for_commit() - prepare volume header for commit
+ * @fsi: pointer on shared file system object
+ * @vh: volume header
+ */
+int ssdfs_prepare_volume_header_for_commit(struct ssdfs_fs_info *fsi,
+					   struct ssdfs_volume_header *vh)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	struct super_block *sb = fsi->sb;
+	u64 dev_size;
+
+	SSDFS_DBG("fsi %p, vh %p\n", fsi, vh);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_store_sb_segs_array(fsi, vh);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	dev_size = fsi->devops->device_size(sb);
+	if (!is_ssdfs_volume_header_consistent(fsi, vh, dev_size)) {
+		SSDFS_ERR("volume header is inconsistent\n");
+		return -EIO;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_prepare_segment_header_for_commit() - prepare segment header
+ * @fsi: pointer on shared file system object
+ * @log_pages: full log pages count
+ * @seg_type: segment type
+ * @seg_flags: segment flags
+ * @last_log_time: log creation time
+ * @last_log_cno: log checkpoint
+ * @hdr: segment header [out]
+ */
+int ssdfs_prepare_segment_header_for_commit(struct ssdfs_fs_info *fsi,
+					    u32 log_pages,
+					    u16 seg_type,
+					    u32 seg_flags,
+					    u64 last_log_time,
+					    u64 last_log_cno,
+					    struct ssdfs_segment_header *hdr)
+{
+	u16 data_size = sizeof(struct ssdfs_segment_header);
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("fsi %p, hdr %p, "
+		  "log_pages %u, seg_type %#x, seg_flags %#x\n",
+		  fsi, hdr, log_pages, seg_type, seg_flags);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	hdr->timestamp = cpu_to_le64(last_log_time);
+	hdr->cno = cpu_to_le64(last_log_cno);
+
+	if (log_pages > fsi->pages_per_seg || log_pages > U16_MAX) {
+		SSDFS_ERR("invalid value of log_pages %u\n", log_pages);
+		return -EINVAL;
+	}
+
+	hdr->log_pages = cpu_to_le16((u16)log_pages);
+
+	if (seg_type == SSDFS_UNKNOWN_SEG_TYPE ||
+	    seg_type > SSDFS_LAST_KNOWN_SEG_TYPE) {
+		SSDFS_ERR("invalid value of seg_type %#x\n", seg_type);
+		return -EINVAL;
+	}
+
+	hdr->seg_type = cpu_to_le16(seg_type);
+	hdr->seg_flags = cpu_to_le32(seg_flags);
+
+	hdr->volume_hdr.check.bytes = cpu_to_le16(data_size);
+	hdr->volume_hdr.check.flags = cpu_to_le16(SSDFS_CRC32);
+
+	err = ssdfs_calculate_csum(&hdr->volume_hdr.check,
+				   hdr, data_size);
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to calculate checksum: err %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_prepare_partial_log_header_for_commit() - prepare partial log header
+ * @fsi: pointer on shared file system object
+ * @sequence_id: sequence ID of the partial log inside the full log
+ * @log_pages: log pages count
+ * @seg_type: segment type
+ * @pl_flags: partial log's flags
+ * @last_log_time: log creation time
+ * @last_log_cno: log checkpoint
+ * @hdr: partial log's header [out]
+ */
+int ssdfs_prepare_partial_log_header_for_commit(struct ssdfs_fs_info *fsi,
+					int sequence_id,
+					u32 log_pages,
+					u16 seg_type,
+					u32 pl_flags,
+					u64 last_log_time,
+					u64 last_log_cno,
+					struct ssdfs_partial_log_header *hdr)
+{
+	u16 data_size = sizeof(struct ssdfs_partial_log_header);
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("fsi %p, hdr %p, sequence_id %d, "
+		  "log_pages %u, seg_type %#x, pl_flags %#x\n",
+		  fsi, hdr, sequence_id, log_pages, seg_type, pl_flags);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	hdr->magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC);
+	hdr->magic.key = cpu_to_le16(SSDFS_PARTIAL_LOG_HDR_MAGIC);
+	hdr->magic.version.major = SSDFS_MAJOR_REVISION;
+	hdr->magic.version.minor = SSDFS_MINOR_REVISION;
+
+	hdr->timestamp = cpu_to_le64(last_log_time);
+	hdr->cno = cpu_to_le64(last_log_cno);
+
+	if (log_pages > fsi->pages_per_seg || log_pages > U16_MAX) {
+		SSDFS_ERR("invalid value of log_pages %u\n", log_pages);
+		return -EINVAL;
+	}
+
+	hdr->log_pages = cpu_to_le16((u16)log_pages);
+	hdr->log_bytes = cpu_to_le32(log_pages << fsi->log_pagesize);
+
+	if (seg_type == SSDFS_UNKNOWN_SEG_TYPE ||
+	    seg_type > SSDFS_LAST_KNOWN_SEG_TYPE) {
+		SSDFS_ERR("invalid value of seg_type %#x\n", seg_type);
+		return -EINVAL;
+	}
+
+	hdr->seg_type = cpu_to_le16(seg_type);
+	hdr->pl_flags = cpu_to_le32(pl_flags);
+
+	spin_lock(&fsi->volume_state_lock);
+	hdr->free_pages = cpu_to_le64(fsi->free_pages);
+	hdr->flags = cpu_to_le32(fsi->fs_flags);
+	spin_unlock(&fsi->volume_state_lock);
+
+	mutex_lock(&fsi->resize_mutex);
+	hdr->nsegs = cpu_to_le64(fsi->nsegs);
+	mutex_unlock(&fsi->resize_mutex);
+
+	ssdfs_memcpy(&hdr->root_folder,
+		     0, sizeof(struct ssdfs_inode),
+		     &fsi->vs->root_folder,
+		     0, sizeof(struct ssdfs_inode),
+		     sizeof(struct ssdfs_inode));
+
+	ssdfs_memcpy(&hdr->inodes_btree,
+		     0, sizeof(struct ssdfs_inodes_btree),
+		     &fsi->vs->inodes_btree,
+		     0, sizeof(struct ssdfs_inodes_btree),
+		     sizeof(struct ssdfs_inodes_btree));
+	ssdfs_memcpy(&hdr->shared_extents_btree,
+		     0, sizeof(struct ssdfs_shared_extents_btree),
+		     &fsi->vs->shared_extents_btree,
+		     0, sizeof(struct ssdfs_shared_extents_btree),
+		     sizeof(struct ssdfs_shared_extents_btree));
+	ssdfs_memcpy(&hdr->shared_dict_btree,
+		     0, sizeof(struct ssdfs_shared_dictionary_btree),
+		     &fsi->vs->shared_dict_btree,
+		     0, sizeof(struct ssdfs_shared_dictionary_btree),
+		     sizeof(struct ssdfs_shared_dictionary_btree));
+	ssdfs_memcpy(&hdr->snapshots_btree,
+		     0, sizeof(struct ssdfs_snapshots_btree),
+		     &fsi->vs->snapshots_btree,
+		     0, sizeof(struct ssdfs_snapshots_btree),
+		     sizeof(struct ssdfs_snapshots_btree));
+	ssdfs_memcpy(&hdr->invextree,
+		     0, sizeof(struct ssdfs_invalidated_extents_btree),
+		     &fsi->vh->invextree,
+		     0, sizeof(struct ssdfs_invalidated_extents_btree),
+		     sizeof(struct ssdfs_invalidated_extents_btree));
+
+	hdr->sequence_id = cpu_to_le32(sequence_id);
+
+	hdr->log_pagesize = fsi->log_pagesize;
+	hdr->log_erasesize = fsi->log_erasesize;
+	hdr->log_segsize = fsi->log_segsize;
+	hdr->log_pebs_per_seg = fsi->log_pebs_per_seg;
+	hdr->lebs_per_peb_index = cpu_to_le32(fsi->lebs_per_peb_index);
+	hdr->create_threads_per_seg = cpu_to_le16(fsi->create_threads_per_seg);
+
+	hdr->open_zones = cpu_to_le32(atomic_read(&fsi->open_zones));
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("open_zones %d\n",
+		  atomic_read(&fsi->open_zones));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	hdr->check.bytes = cpu_to_le16(data_size);
+	hdr->check.flags = cpu_to_le16(SSDFS_CRC32);
+
+	err = ssdfs_calculate_csum(&hdr->check,
+				   hdr, data_size);
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to calculate checksum: err %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH 07/76] ssdfs: basic mount logic implementation
  2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
                   ` (5 preceding siblings ...)
  2023-02-25  1:08 ` [RFC PATCH 06/76] ssdfs: segment header + log footer operations Viacheslav Dubeyko
@ 2023-02-25  1:08 ` Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 08/76] ssdfs: search last actual superblock Viacheslav Dubeyko
                   ` (69 subsequent siblings)
  76 siblings, 0 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

This patch implements logic of search/recovery of
last actual superblock.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
CC: Viacheslav Dubeyko <viacheslav.dubeyko@bytedance.com>
CC: Luka Perkov <luka.perkov@sartura.hr>
CC: Bruno Banelli <bruno.banelli@sartura.hr>
---
 fs/ssdfs/recovery.c | 3144 +++++++++++++++++++++++++++++++++++++++++++
 fs/ssdfs/recovery.h |  446 ++++++
 2 files changed, 3590 insertions(+)
 create mode 100644 fs/ssdfs/recovery.c
 create mode 100644 fs/ssdfs/recovery.h

diff --git a/fs/ssdfs/recovery.c b/fs/ssdfs/recovery.c
new file mode 100644
index 000000000000..dcb56ac0d682
--- /dev/null
+++ b/fs/ssdfs/recovery.c
@@ -0,0 +1,3144 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/recovery.c - searching actual state and recovery on mount code.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/slab.h>
+#include <linux/pagevec.h>
+#include <linux/blkdev.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+#include "page_array.h"
+#include "page_vector.h"
+#include "peb.h"
+#include "offset_translation_table.h"
+#include "segment_bitmap.h"
+#include "peb_mapping_table.h"
+#include "recovery.h"
+
+#include <trace/events/ssdfs.h>
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_recovery_page_leaks;
+atomic64_t ssdfs_recovery_memory_leaks;
+atomic64_t ssdfs_recovery_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_recovery_cache_leaks_increment(void *kaddr)
+ * void ssdfs_recovery_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_recovery_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_recovery_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_recovery_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_recovery_kfree(void *kaddr)
+ * struct page *ssdfs_recovery_alloc_page(gfp_t gfp_mask)
+ * struct page *ssdfs_recovery_add_pagevec_page(struct pagevec *pvec)
+ * void ssdfs_recovery_free_page(struct page *page)
+ * void ssdfs_recovery_pagevec_release(struct pagevec *pvec)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(recovery)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(recovery)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_recovery_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_recovery_page_leaks, 0);
+	atomic64_set(&ssdfs_recovery_memory_leaks, 0);
+	atomic64_set(&ssdfs_recovery_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_recovery_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_recovery_page_leaks) != 0) {
+		SSDFS_ERR("RECOVERY: "
+			  "memory leaks include %lld pages\n",
+			  atomic64_read(&ssdfs_recovery_page_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_recovery_memory_leaks) != 0) {
+		SSDFS_ERR("RECOVERY: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_recovery_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_recovery_cache_leaks) != 0) {
+		SSDFS_ERR("RECOVERY: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_recovery_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+int ssdfs_init_sb_info(struct ssdfs_fs_info *fsi,
+			struct ssdfs_sb_info *sbi)
+{
+	void *vh_buf = NULL;
+	void *vs_buf = NULL;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sbi %p, hdr_size %zu, footer_size %zu\n",
+		  sbi, hdr_size, footer_size);
+
+	BUG_ON(!sbi);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	sbi->vh_buf = NULL;
+	sbi->vs_buf = NULL;
+
+	hdr_size = max_t(size_t, hdr_size, (size_t)SSDFS_4KB);
+	sbi->vh_buf_size = hdr_size;
+	footer_size = max_t(size_t, footer_size, (size_t)SSDFS_4KB);
+	sbi->vs_buf_size = footer_size;
+
+	vh_buf = ssdfs_recovery_kzalloc(sbi->vh_buf_size, GFP_KERNEL);
+	vs_buf = ssdfs_recovery_kzalloc(sbi->vs_buf_size, GFP_KERNEL);
+	if (unlikely(!vh_buf || !vs_buf)) {
+		SSDFS_ERR("unable to allocate superblock buffers\n");
+		err = -ENOMEM;
+		goto free_buf;
+	}
+
+	sbi->vh_buf = vh_buf;
+	sbi->vs_buf = vs_buf;
+
+	return 0;
+
+free_buf:
+	ssdfs_recovery_kfree(vh_buf);
+	ssdfs_recovery_kfree(vs_buf);
+	return err;
+}
+
+void ssdfs_destruct_sb_info(struct ssdfs_sb_info *sbi)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sbi);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!sbi->vh_buf || !sbi->vs_buf)
+		return;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sbi %p, sbi->vh_buf %p, sbi->vs_buf %p, "
+		  "sbi->last_log.leb_id %llu, sbi->last_log.peb_id %llu, "
+		  "sbi->last_log.page_offset %u, "
+		  "sbi->last_log.pages_count %u\n",
+		  sbi, sbi->vh_buf, sbi->vs_buf, sbi->last_log.leb_id,
+		  sbi->last_log.peb_id, sbi->last_log.page_offset,
+		  sbi->last_log.pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_recovery_kfree(sbi->vh_buf);
+	ssdfs_recovery_kfree(sbi->vs_buf);
+	sbi->vh_buf = NULL;
+	sbi->vh_buf_size = 0;
+	sbi->vs_buf = NULL;
+	sbi->vs_buf_size = 0;
+	memset(&sbi->last_log, 0, sizeof(struct ssdfs_peb_extent));
+}
+
+void ssdfs_backup_sb_info(struct ssdfs_fs_info *fsi)
+{
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+	size_t extent_size = sizeof(struct ssdfs_peb_extent);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->sbi.vh_buf || !fsi->sbi.vs_buf);
+	BUG_ON(!fsi->sbi_backup.vh_buf || !fsi->sbi_backup.vs_buf);
+
+	SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, "
+		  "page_offset %u, pages_count %u, "
+		  "volume state: free_pages %llu, timestamp %#llx, "
+		  "cno %#llx, fs_state %#x\n",
+		  fsi->sbi.last_log.leb_id,
+		  fsi->sbi.last_log.peb_id,
+		  fsi->sbi.last_log.page_offset,
+		  fsi->sbi.last_log.pages_count,
+		  le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->free_pages),
+		  le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->timestamp),
+		  le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->cno),
+		  le16_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->state));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_memcpy(fsi->sbi_backup.vh_buf, 0, hdr_size,
+		     fsi->sbi.vh_buf, 0, hdr_size,
+		     hdr_size);
+	ssdfs_memcpy(fsi->sbi_backup.vs_buf, 0, footer_size,
+		     fsi->sbi.vs_buf, 0, footer_size,
+		     footer_size);
+	ssdfs_memcpy(&fsi->sbi_backup.last_log, 0, extent_size,
+		     &fsi->sbi.last_log, 0, extent_size,
+		     extent_size);
+}
+
+void ssdfs_copy_sb_info(struct ssdfs_fs_info *fsi,
+			struct ssdfs_recovery_env *env)
+{
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	size_t vhdr_size = sizeof(struct ssdfs_volume_header);
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+	size_t extent_size = sizeof(struct ssdfs_peb_extent);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->sbi.vh_buf || !fsi->sbi.vs_buf);
+	BUG_ON(!fsi->sbi_backup.vh_buf || !fsi->sbi_backup.vs_buf);
+	BUG_ON(!env);
+	BUG_ON(!env->sbi.vh_buf || !env->sbi.vs_buf);
+	BUG_ON(!env->sbi_backup.vh_buf || !env->sbi_backup.vs_buf);
+
+	SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, "
+		  "page_offset %u, pages_count %u, "
+		  "volume state: free_pages %llu, timestamp %#llx, "
+		  "cno %#llx, fs_state %#x\n",
+		  env->sbi.last_log.leb_id,
+		  env->sbi.last_log.peb_id,
+		  env->sbi.last_log.page_offset,
+		  env->sbi.last_log.pages_count,
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->free_pages),
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->timestamp),
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->cno),
+		  le16_to_cpu(SSDFS_VS(env->sbi.vs_buf)->state));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_memcpy(fsi->sbi.vh_buf, 0, hdr_size,
+		     env->sbi.vh_buf, 0, hdr_size,
+		     hdr_size);
+	ssdfs_memcpy(fsi->sbi.vs_buf, 0, footer_size,
+		     env->sbi.vs_buf, 0, footer_size,
+		     footer_size);
+	ssdfs_memcpy(&fsi->sbi.last_log, 0, extent_size,
+		     &env->sbi.last_log, 0, extent_size,
+		     extent_size);
+	ssdfs_memcpy(fsi->sbi_backup.vh_buf, 0, hdr_size,
+		     env->sbi_backup.vh_buf, 0, hdr_size,
+		     hdr_size);
+	ssdfs_memcpy(fsi->sbi_backup.vs_buf, 0, footer_size,
+		     env->sbi_backup.vs_buf, 0, footer_size,
+		     footer_size);
+	ssdfs_memcpy(&fsi->sbi_backup.last_log, 0, extent_size,
+		     &env->sbi_backup.last_log, 0, extent_size,
+		     extent_size);
+	ssdfs_memcpy(&fsi->last_vh, 0, vhdr_size,
+		     &env->last_vh, 0, vhdr_size,
+		     vhdr_size);
+}
+
+void ssdfs_restore_sb_info(struct ssdfs_fs_info *fsi)
+{
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+	size_t extent_size = sizeof(struct ssdfs_peb_extent);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->sbi.vh_buf || !fsi->sbi.vs_buf);
+	BUG_ON(!fsi->sbi_backup.vh_buf || !fsi->sbi_backup.vs_buf);
+
+	SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, "
+		  "page_offset %u, pages_count %u, "
+		  "volume state: free_pages %llu, timestamp %#llx, "
+		  "cno %#llx, fs_state %#x\n",
+		  fsi->sbi.last_log.leb_id,
+		  fsi->sbi.last_log.peb_id,
+		  fsi->sbi.last_log.page_offset,
+		  fsi->sbi.last_log.pages_count,
+		  le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->free_pages),
+		  le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->timestamp),
+		  le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->cno),
+		  le16_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->state));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_memcpy(fsi->sbi.vh_buf, 0, hdr_size,
+		     fsi->sbi_backup.vh_buf, 0, hdr_size,
+		     hdr_size);
+	ssdfs_memcpy(fsi->sbi.vs_buf, 0, footer_size,
+		     fsi->sbi_backup.vs_buf, 0, footer_size,
+		     footer_size);
+	ssdfs_memcpy(&fsi->sbi.last_log, 0, extent_size,
+		     &fsi->sbi_backup.last_log, 0, extent_size,
+		     extent_size);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, "
+		  "page_offset %u, pages_count %u, "
+		  "volume state: free_pages %llu, timestamp %#llx, "
+		  "cno %#llx, fs_state %#x\n",
+		  fsi->sbi.last_log.leb_id,
+		  fsi->sbi.last_log.peb_id,
+		  fsi->sbi.last_log.page_offset,
+		  fsi->sbi.last_log.pages_count,
+		  le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->free_pages),
+		  le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->timestamp),
+		  le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->cno),
+		  le16_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->state));
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static int find_seg_with_valid_start_peb(struct ssdfs_fs_info *fsi,
+					 size_t seg_size,
+					 loff_t *offset,
+					 u64 threshold,
+					 int silent,
+					 int op_type)
+{
+	struct super_block *sb = fsi->sb;
+	loff_t off;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	struct ssdfs_volume_header *vh;
+	bool magic_valid = false;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("fsi %p, seg_size %zu, start_offset %llu, "
+		  "threshold %llu, silent %#x, op_type %#x\n",
+		  fsi, seg_size, (unsigned long long)*offset,
+		  threshold, silent, op_type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (op_type) {
+	case SSDFS_USE_PEB_ISBAD_OP:
+		if (!fsi->devops->peb_isbad) {
+			SSDFS_ERR("unable to detect bad PEB\n");
+			return -EOPNOTSUPP;
+		}
+		break;
+
+	case SSDFS_USE_READ_OP:
+		if (!fsi->devops->read) {
+			SSDFS_ERR("unable to read from device\n");
+			return -EOPNOTSUPP;
+		}
+		break;
+
+	default:
+		BUG();
+	};
+
+	if (*offset != SSDFS_RESERVED_VBR_SIZE)
+		off = (*offset / seg_size) * seg_size;
+	else
+		off = *offset;
+
+	while (off < threshold) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("off %llu\n", (u64)off);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		switch (op_type) {
+		case SSDFS_USE_PEB_ISBAD_OP:
+			err = fsi->devops->peb_isbad(sb, off);
+			magic_valid = true;
+			break;
+
+		case SSDFS_USE_READ_OP:
+			err = fsi->devops->read(sb, off, hdr_size,
+						fsi->sbi.vh_buf);
+			vh = SSDFS_VH(fsi->sbi.vh_buf);
+			magic_valid = is_ssdfs_magic_valid(&vh->magic);
+			break;
+
+		default:
+			BUG();
+		};
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("HEADER DUMP: magic_valid %#x, err %d\n",
+			  magic_valid, err);
+		print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+				     fsi->sbi.vh_buf, hdr_size);
+		SSDFS_DBG("\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (!err) {
+			if (magic_valid) {
+				*offset = off;
+				return 0;
+			}
+		} else if (!silent) {
+			SSDFS_NOTICE("offset %llu is in bad PEB\n",
+					(unsigned long long)off);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("offset %llu is in bad PEB\n",
+				  (unsigned long long)off);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		off += 2 * seg_size;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("unable to find valid PEB\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return -ENODATA;
+}
+
+static int ssdfs_find_any_valid_volume_header(struct ssdfs_fs_info *fsi,
+						loff_t offset,
+						int silent)
+{
+	struct super_block *sb;
+	size_t seg_size = SSDFS_128KB;
+	loff_t start_offset = offset;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	u64 dev_size;
+	u64 threshold;
+	struct ssdfs_volume_header *vh;
+	bool magic_valid, crc_valid;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->sbi.vh_buf);
+	BUG_ON(!fsi->devops->read);
+
+	SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p, silent %#x\n",
+		  fsi, fsi->sbi.vh_buf, silent);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	sb = fsi->sb;
+	dev_size = fsi->devops->device_size(sb);
+
+try_seg_size:
+	threshold = SSDFS_MAPTBL_PROTECTION_STEP;
+	threshold *= SSDFS_MAPTBL_PROTECTION_RANGE;
+	threshold *= seg_size;
+	threshold = min_t(u64, dev_size, threshold + offset);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("offset %llu, dev_size %llu, threshold %llu\n",
+		  offset, dev_size, threshold);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (fsi->devops->peb_isbad) {
+		err = fsi->devops->peb_isbad(sb, offset);
+		if (err) {
+			if (!silent) {
+				SSDFS_NOTICE("offset %llu is in bad PEB\n",
+						(unsigned long long)offset);
+			} else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("offset %llu is in bad PEB\n",
+					  (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+
+			offset += seg_size;
+			err = find_seg_with_valid_start_peb(fsi, seg_size,
+							&offset, threshold,
+							silent,
+							SSDFS_USE_PEB_ISBAD_OP);
+			if (err) {
+				switch (seg_size) {
+				case SSDFS_128KB:
+					offset = start_offset;
+					seg_size = SSDFS_256KB;
+					goto try_seg_size;
+
+				case SSDFS_256KB:
+					offset = start_offset;
+					seg_size = SSDFS_512KB;
+					goto try_seg_size;
+
+				case SSDFS_512KB:
+					offset = start_offset;
+					seg_size = SSDFS_2MB;
+					goto try_seg_size;
+
+				case SSDFS_2MB:
+					offset = start_offset;
+					seg_size = SSDFS_8MB;
+					goto try_seg_size;
+
+				default:
+					/* finish search */
+					break;
+				}
+
+				SSDFS_NOTICE("unable to find valid start PEB: "
+					     "err %d\n", err);
+				return err;
+			}
+		}
+	}
+
+	err = find_seg_with_valid_start_peb(fsi, seg_size, &offset,
+					    threshold, silent,
+					    SSDFS_USE_READ_OP);
+	if (unlikely(err)) {
+		switch (seg_size) {
+		case SSDFS_128KB:
+			offset = start_offset;
+			seg_size = SSDFS_256KB;
+			goto try_seg_size;
+
+		case SSDFS_256KB:
+			offset = start_offset;
+			seg_size = SSDFS_512KB;
+			goto try_seg_size;
+
+		case SSDFS_512KB:
+			offset = start_offset;
+			seg_size = SSDFS_2MB;
+			goto try_seg_size;
+
+		case SSDFS_2MB:
+			offset = start_offset;
+			seg_size = SSDFS_8MB;
+			goto try_seg_size;
+
+		default:
+			/* finish search */
+			break;
+		}
+
+		SSDFS_NOTICE("unable to find valid start PEB\n");
+		return err;
+	}
+
+	vh = SSDFS_VH(fsi->sbi.vh_buf);
+
+	seg_size = 1 << vh->log_segsize;
+
+	magic_valid = is_ssdfs_magic_valid(&vh->magic);
+	crc_valid = is_ssdfs_volume_header_csum_valid(fsi->sbi.vh_buf,
+							hdr_size);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("magic_valid %#x, crc_valid %#x\n",
+		  magic_valid, crc_valid);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!magic_valid && !crc_valid) {
+		if (!silent)
+			SSDFS_NOTICE("valid magic is not detected\n");
+		else
+			SSDFS_DBG("valid magic is not detected\n");
+		return -ENOENT;
+	} else if ((magic_valid && !crc_valid) || (!magic_valid && crc_valid)) {
+		loff_t start_off;
+
+try_again:
+		start_off = offset;
+		if (offset >= (threshold - seg_size)) {
+			if (!silent)
+				SSDFS_NOTICE("valid magic is not detected\n");
+			else
+				SSDFS_DBG("valid magic is not detected\n");
+			return -ENOENT;
+		}
+
+		if (fsi->devops->peb_isbad) {
+			err = find_seg_with_valid_start_peb(fsi, seg_size,
+							&offset, threshold,
+							silent,
+							SSDFS_USE_PEB_ISBAD_OP);
+			if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("unable to find valid start PEB: "
+					  "err %d\n", err);
+#endif /* CONFIG_SSDFS_DEBUG */
+				return err;
+			}
+		}
+
+		if (start_off == offset)
+			offset += seg_size;
+
+		err = find_seg_with_valid_start_peb(fsi, seg_size, &offset,
+						    threshold, silent,
+						    SSDFS_USE_READ_OP);
+		if (unlikely(err)) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find valid start PEB: "
+				  "err %d\n", err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return err;
+		}
+
+		magic_valid = is_ssdfs_magic_valid(&vh->magic);
+		crc_valid = is_ssdfs_volume_header_csum_valid(fsi->sbi.vh_buf,
+								hdr_size);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("magic_valid %#x, crc_valid %#x\n",
+			  magic_valid, crc_valid);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (!(magic_valid && crc_valid)) {
+			if (!silent)
+				SSDFS_NOTICE("valid magic is not detected\n");
+			else
+				SSDFS_DBG("valid magic is not detected\n");
+			return -ENOENT;
+		}
+	}
+
+	if (!is_ssdfs_volume_header_consistent(fsi, vh, dev_size))
+		goto try_again;
+
+	fsi->pagesize = 1 << vh->log_pagesize;
+
+	if (fsi->is_zns_device) {
+		fsi->erasesize = fsi->zone_size;
+		fsi->segsize = fsi->erasesize * le16_to_cpu(vh->pebs_per_seg);
+	} else {
+		fsi->erasesize = 1 << vh->log_erasesize;
+		fsi->segsize = 1 << vh->log_segsize;
+	}
+
+	fsi->pages_per_seg = fsi->segsize / fsi->pagesize;
+	fsi->pages_per_peb = fsi->erasesize / fsi->pagesize;
+	fsi->pebs_per_seg = 1 << vh->log_pebs_per_seg;
+
+	return 0;
+}
+
+static int ssdfs_read_checked_sb_info(struct ssdfs_fs_info *fsi, u64 peb_id,
+				      u32 pages_off, bool silent)
+{
+	u32 lf_off;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+
+	SSDFS_DBG("fsi %p, peb_id %llu, pages_off %u, silent %#x\n",
+		  fsi, peb_id, pages_off, silent);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_read_checked_segment_header(fsi, peb_id, pages_off,
+						fsi->sbi.vh_buf, silent);
+	if (err) {
+		if (!silent) {
+			SSDFS_ERR("volume header is corrupted: "
+				  "peb_id %llu, offset %d, err %d\n",
+				  peb_id, pages_off, err);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("volume header is corrupted: "
+				  "peb_id %llu, offset %d, err %d\n",
+				  peb_id, pages_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return err;
+	}
+
+	lf_off = SSDFS_LOG_FOOTER_OFF(fsi->sbi.vh_buf);
+
+	err = ssdfs_read_checked_log_footer(fsi, SSDFS_SEG_HDR(fsi->sbi.vh_buf),
+					    peb_id, lf_off, fsi->sbi.vs_buf,
+					    silent);
+	if (err) {
+		if (!silent) {
+			SSDFS_ERR("log footer is corrupted: "
+				  "peb_id %llu, offset %d, err %d\n",
+				  peb_id, lf_off, err);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("log footer is corrupted: "
+				  "peb_id %llu, offset %d, err %d\n",
+				  peb_id, lf_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return err;
+	}
+
+	return 0;
+}
+
+static int ssdfs_read_checked_sb_info2(struct ssdfs_fs_info *fsi, u64 peb_id,
+					u32 pages_off, bool silent,
+					u32 *cur_off)
+{
+	u32 bytes_off;
+	u32 log_pages;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+
+	SSDFS_DBG("fsi %p, peb_id %llu, pages_off %u, silent %#x\n",
+		  fsi, peb_id, pages_off, silent);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	bytes_off = pages_off * fsi->pagesize;
+
+	err = ssdfs_read_unchecked_log_footer(fsi, peb_id, bytes_off,
+					      fsi->sbi.vs_buf, silent,
+					      &log_pages);
+	if (err) {
+		if (!silent) {
+			SSDFS_ERR("fail to read the log footer: "
+				  "peb_id %llu, offset %u, err %d\n",
+				  peb_id, bytes_off, err);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("fail to read the log footer: "
+				  "peb_id %llu, offset %u, err %d\n",
+				  peb_id, bytes_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return err;
+	}
+
+	if (log_pages == 0 ||
+	    log_pages > fsi->pages_per_peb ||
+	    pages_off < log_pages) {
+		if (!silent) {
+			SSDFS_ERR("invalid log_pages %u\n", log_pages);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("invalid log_pages %u\n", log_pages);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -ERANGE;
+	}
+
+	pages_off -= log_pages - 1;
+	*cur_off -= log_pages - 1;
+
+	err = ssdfs_read_checked_segment_header(fsi, peb_id, pages_off,
+						fsi->sbi.vh_buf, silent);
+	if (err) {
+		if (!silent) {
+			SSDFS_ERR("volume header is corrupted: "
+				  "peb_id %llu, offset %d, err %d\n",
+				  peb_id, pages_off, err);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("volume header is corrupted: "
+				  "peb_id %llu, offset %d, err %d\n",
+				  peb_id, pages_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return err;
+	}
+
+	err = ssdfs_check_log_footer(fsi,
+				     SSDFS_SEG_HDR(fsi->sbi.vh_buf),
+				     SSDFS_LF(fsi->sbi.vs_buf),
+				     silent);
+	if (err) {
+		if (!silent) {
+			SSDFS_ERR("log footer is corrupted: "
+				  "peb_id %llu, bytes_off %u, err %d\n",
+				  peb_id, bytes_off, err);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("log footer is corrupted: "
+				  "peb_id %llu, bytes_off %u, err %d\n",
+				  peb_id, bytes_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return err;
+	}
+
+	return 0;
+}
+
+static int ssdfs_find_any_valid_sb_segment(struct ssdfs_fs_info *fsi,
+					   u64 start_peb_id)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+#endif /* CONFIG_SSDFS_DEBUG */
+	size_t vh_size = sizeof(struct ssdfs_volume_header);
+	struct ssdfs_volume_header *vh;
+	struct ssdfs_segment_header *seg_hdr;
+	u64 dev_size;
+	loff_t offset = start_peb_id * fsi->erasesize;
+	loff_t step = SSDFS_RESERVED_SB_SEGS * SSDFS_128KB;
+	u64 last_cno, cno;
+	__le64 peb1, peb2;
+	__le64 leb1, leb2;
+	u64 checked_pebs[SSDFS_SB_CHAIN_MAX][SSDFS_SB_SEG_COPY_MAX];
+	int i, j;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->sbi.vh_buf);
+	BUG_ON(!fsi->devops->read);
+	BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(fsi->sbi.vh_buf)->magic));
+	BUG_ON(!is_ssdfs_volume_header_csum_valid(fsi->sbi.vh_buf, hdr_size));
+
+	SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p, start_peb_id %llu\n",
+		  fsi, fsi->sbi.vh_buf, start_peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	i = SSDFS_SB_CHAIN_MAX;
+	dev_size = fsi->devops->device_size(fsi->sb);
+	memset(checked_pebs, 0xFF,
+		(SSDFS_SB_CHAIN_MAX * sizeof(u64)) +
+		(SSDFS_SB_SEG_COPY_MAX * sizeof(u64)));
+
+try_next_volume_portion:
+	ssdfs_memcpy(&fsi->last_vh, 0, vh_size,
+		     fsi->sbi.vh_buf, 0, vh_size,
+		     vh_size);
+	last_cno = le64_to_cpu(SSDFS_SEG_HDR(fsi->sbi.vh_buf)->cno);
+
+try_again:
+	switch (i) {
+	case SSDFS_SB_CHAIN_MAX:
+		i = SSDFS_CUR_SB_SEG;
+		break;
+
+	case SSDFS_CUR_SB_SEG:
+		i = SSDFS_NEXT_SB_SEG;
+		break;
+
+	case SSDFS_NEXT_SB_SEG:
+		i = SSDFS_RESERVED_SB_SEG;
+		break;
+
+	default:
+		offset += step;
+
+		if (offset >= dev_size)
+			goto fail_find_sb_seg;
+
+		err =  ssdfs_find_any_valid_volume_header(fsi, offset, true);
+		if (err)
+			goto fail_find_sb_seg;
+		else {
+			i = SSDFS_SB_CHAIN_MAX;
+			goto try_next_volume_portion;
+		}
+		break;
+	}
+
+	err = -ENODATA;
+
+	for (j = SSDFS_MAIN_SB_SEG; j < SSDFS_SB_SEG_COPY_MAX; j++) {
+		u64 leb_id = le64_to_cpu(fsi->last_vh.sb_pebs[i][j].leb_id);
+		u64 peb_id = le64_to_cpu(fsi->last_vh.sb_pebs[i][j].peb_id);
+		u16 seg_type;
+
+		if (peb_id == U64_MAX || leb_id == U64_MAX) {
+			err = -ERANGE;
+			SSDFS_ERR("invalid peb_id %llu, leb_id %llu\n",
+				  leb_id, peb_id);
+			goto fail_find_sb_seg;
+		}
+
+		if (start_peb_id > peb_id)
+			continue;
+
+		if (checked_pebs[i][j] == peb_id)
+			continue;
+		else
+			checked_pebs[i][j] = peb_id;
+
+		if ((peb_id * fsi->erasesize) < dev_size)
+			offset = peb_id * fsi->erasesize;
+
+		err = ssdfs_read_checked_sb_info(fsi, peb_id,
+						 0, true);
+		if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("peb_id %llu is corrupted: err %d\n",
+				  peb_id, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			continue;
+		}
+
+		fsi->sbi.last_log.leb_id = leb_id;
+		fsi->sbi.last_log.peb_id = peb_id;
+		fsi->sbi.last_log.page_offset = 0;
+		fsi->sbi.last_log.pages_count =
+			SSDFS_LOG_PAGES(fsi->sbi.vh_buf);
+
+		seg_hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf);
+		seg_type = SSDFS_SEG_TYPE(seg_hdr);
+
+		if (seg_type == SSDFS_SB_SEG_TYPE)
+			return 0;
+		else {
+			err = -EIO;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("PEB %llu is not sb segment\n",
+				  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		if (!err)
+			goto compare_vh_info;
+	}
+
+	if (err) {
+		ssdfs_memcpy(fsi->sbi.vh_buf, 0, vh_size,
+			     &fsi->last_vh, 0, vh_size,
+			     vh_size);
+		goto try_again;
+	}
+
+compare_vh_info:
+	vh = SSDFS_VH(fsi->sbi.vh_buf);
+	seg_hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf);
+	leb1 = fsi->last_vh.sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].leb_id;
+	leb2 = vh->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].leb_id;
+	peb1 = fsi->last_vh.sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].peb_id;
+	peb2 = vh->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].peb_id;
+	cno = le64_to_cpu(seg_hdr->cno);
+
+	if (cno > last_cno && (leb1 != leb2 || peb1 != peb2)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("cno %llu, last_cno %llu, "
+			  "leb1 %llu, leb2 %llu, "
+			  "peb1 %llu, peb2 %llu\n",
+			  cno, last_cno,
+			  le64_to_cpu(leb1), le64_to_cpu(leb2),
+			  le64_to_cpu(peb1), le64_to_cpu(peb2));
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto try_again;
+	}
+
+fail_find_sb_seg:
+	SSDFS_CRIT("unable to find any valid segment with superblocks chain\n");
+	return -EIO;
+}
+
+static inline bool is_sb_peb_exhausted2(struct ssdfs_fs_info *fsi,
+					u64 leb_id, u64 peb_id)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+#endif /* CONFIG_SSDFS_DEBUG */
+	struct ssdfs_peb_extent checking_page;
+	u64 pages_per_peb;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->sbi.vh_buf);
+	BUG_ON(!fsi->devops->read);
+	BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(fsi->sbi.vh_buf)->magic));
+	BUG_ON(!is_ssdfs_volume_header_csum_valid(fsi->sbi.vh_buf, hdr_size));
+
+	SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p, "
+		  "leb_id %llu, peb_id %llu\n",
+		  fsi, fsi->sbi.vh_buf,
+		  leb_id, peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!fsi->devops->can_write_page) {
+		SSDFS_CRIT("fail to find latest valid sb info: "
+			   "can_write_page is not supported\n");
+		return true;
+	}
+
+	if (leb_id >= U64_MAX || peb_id >= U64_MAX) {
+		SSDFS_ERR("invalid leb_id %llu or peb_id %llu\n",
+			  leb_id, peb_id);
+		return true;
+	}
+
+	checking_page.leb_id = leb_id;
+	checking_page.peb_id = peb_id;
+
+	if (fsi->is_zns_device) {
+		pages_per_peb = div64_u64(fsi->zone_capacity, fsi->pagesize);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(pages_per_peb >= U32_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		checking_page.page_offset = (u32)pages_per_peb - 2;
+	} else {
+		checking_page.page_offset = fsi->pages_per_peb - 2;
+	}
+
+	checking_page.pages_count = 1;
+
+	err = ssdfs_can_write_sb_log(fsi->sb, &checking_page);
+	if (!err)
+		return false;
+
+	return true;
+}
+
+static inline bool is_cur_main_sb_peb_exhausted2(struct ssdfs_fs_info *fsi)
+{
+	u64 leb_id;
+	u64 peb_id;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	leb_id = SSDFS_MAIN_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf),
+				   SSDFS_CUR_SB_SEG);
+	peb_id = SSDFS_MAIN_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf),
+				   SSDFS_CUR_SB_SEG);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p, "
+		  "leb_id %llu, peb_id %llu\n",
+		  fsi, fsi->sbi.vh_buf,
+		  leb_id, peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return is_sb_peb_exhausted2(fsi, leb_id, peb_id);
+}
+
+static inline bool is_cur_copy_sb_peb_exhausted2(struct ssdfs_fs_info *fsi)
+{
+	u64 leb_id;
+	u64 peb_id;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	leb_id = SSDFS_COPY_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf),
+				   SSDFS_CUR_SB_SEG);
+	peb_id = SSDFS_COPY_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf),
+				   SSDFS_CUR_SB_SEG);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p, "
+		  "leb_id %llu, peb_id %llu\n",
+		  fsi, fsi->sbi.vh_buf,
+		  leb_id, peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return is_sb_peb_exhausted2(fsi, leb_id, peb_id);
+}
+
+static int ssdfs_find_latest_valid_sb_segment(struct ssdfs_fs_info *fsi)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+#endif /* CONFIG_SSDFS_DEBUG */
+	struct ssdfs_volume_header *last_vh;
+	u64 cur_main_sb_peb, cur_copy_sb_peb;
+	u64 cno1, cno2;
+	u64 cur_peb, next_peb, prev_peb;
+	u64 cur_leb, next_leb, prev_leb;
+	u16 seg_type;
+	loff_t offset;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->sbi.vh_buf);
+	BUG_ON(!fsi->devops->read);
+	BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(fsi->sbi.vh_buf)->magic));
+	BUG_ON(!is_ssdfs_volume_header_csum_valid(fsi->sbi.vh_buf, hdr_size));
+
+	SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p\n", fsi, fsi->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+try_next_peb:
+	last_vh = SSDFS_VH(fsi->sbi.vh_buf);
+	cur_main_sb_peb = SSDFS_MAIN_SB_PEB(last_vh, SSDFS_CUR_SB_SEG);
+	cur_copy_sb_peb = SSDFS_COPY_SB_PEB(last_vh, SSDFS_CUR_SB_SEG);
+
+	if (cur_main_sb_peb != fsi->sbi.last_log.peb_id &&
+	    cur_copy_sb_peb != fsi->sbi.last_log.peb_id) {
+		SSDFS_ERR("volume header is corrupted\n");
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("cur_main_sb_peb %llu, cur_copy_sb_peb %llu, "
+			  "read PEB %llu\n",
+			  cur_main_sb_peb, cur_copy_sb_peb,
+			  fsi->sbi.last_log.peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		err = -EIO;
+		goto end_search;
+	}
+
+	if (cur_main_sb_peb == fsi->sbi.last_log.peb_id) {
+		if (!is_cur_main_sb_peb_exhausted2(fsi))
+			goto end_search;
+	} else {
+		if (!is_cur_copy_sb_peb_exhausted2(fsi))
+			goto end_search;
+	}
+
+	ssdfs_backup_sb_info(fsi);
+
+	next_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	next_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	if (next_leb == U64_MAX || next_peb == U64_MAX) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid next_leb %llu, next_peb %llu\n",
+			  next_leb, next_peb);
+		goto end_search;
+	}
+
+	err = ssdfs_read_checked_sb_info(fsi, next_peb, 0, true);
+	if (!err) {
+		fsi->sbi.last_log.leb_id = next_leb;
+		fsi->sbi.last_log.peb_id = next_peb;
+		fsi->sbi.last_log.page_offset = 0;
+		fsi->sbi.last_log.pages_count =
+				SSDFS_LOG_PAGES(fsi->sbi.vh_buf);
+		goto check_volume_header;
+	} else {
+		ssdfs_restore_sb_info(fsi);
+		err = 0; /* try to read the backup copy */
+	}
+
+	next_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	next_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	if (next_leb == U64_MAX || next_peb == U64_MAX) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid next_leb %llu, next_peb %llu\n",
+			  next_leb, next_peb);
+		goto end_search;
+	}
+
+	err = ssdfs_read_checked_sb_info(fsi, next_peb, 0, true);
+	if (err) {
+		if (err == -EIO) {
+			/* next sb segments are corrupted */
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("next sb PEB %llu is corrupted\n",
+				  next_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		} else {
+			/* next sb segments are invalid */
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("next sb PEB %llu is invalid\n",
+				  next_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		ssdfs_restore_sb_info(fsi);
+
+		offset = next_peb * fsi->erasesize;
+
+		err = ssdfs_find_any_valid_volume_header(fsi, offset, true);
+		if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find any valid header: "
+				  "peb_id %llu\n",
+				  next_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+			err = 0;
+			goto rollback_valid_vh;
+		}
+
+		err = ssdfs_find_any_valid_sb_segment(fsi, next_peb);
+		if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find any valid sb seg: "
+				  "peb_id %llu\n",
+				  next_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+			err = 0;
+			goto rollback_valid_vh;
+		} else
+			goto try_next_peb;
+	}
+
+	fsi->sbi.last_log.leb_id = next_leb;
+	fsi->sbi.last_log.peb_id = next_peb;
+	fsi->sbi.last_log.page_offset = 0;
+	fsi->sbi.last_log.pages_count = SSDFS_LOG_PAGES(fsi->sbi.vh_buf);
+
+check_volume_header:
+	seg_type = SSDFS_SEG_TYPE(SSDFS_SEG_HDR(fsi->sbi.vh_buf));
+	if (seg_type != SSDFS_SB_SEG_TYPE) {
+		SSDFS_DBG("invalid segment type\n");
+		err = 0;
+		goto mount_fs_read_only;
+	}
+
+	cno1 = SSDFS_SEG_CNO(fsi->sbi_backup.vh_buf);
+	cno2 = SSDFS_SEG_CNO(fsi->sbi.vh_buf);
+	if (cno1 >= cno2) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("last cno %llu is not lesser than read cno %llu\n",
+			  cno1, cno2);
+#endif /* CONFIG_SSDFS_DEBUG */
+		err = 0;
+		goto mount_fs_read_only;
+	}
+
+	next_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(fsi->sbi_backup.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	cur_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (next_peb != cur_peb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("next_peb %llu doesn't equal to cur_peb %llu\n",
+			  next_peb, cur_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		err = 0;
+		goto mount_fs_read_only;
+	}
+
+	prev_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf),
+					SSDFS_PREV_SB_SEG);
+	cur_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(fsi->sbi_backup.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (prev_peb != cur_peb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("prev_peb %llu doesn't equal to cur_peb %llu\n",
+			  prev_peb, cur_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		err = 0;
+		goto mount_fs_read_only;
+	}
+
+	next_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(fsi->sbi_backup.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	cur_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (next_leb != cur_leb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("next_leb %llu doesn't equal to cur_leb %llu\n",
+			  next_leb, cur_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		err = 0;
+		goto mount_fs_read_only;
+	}
+
+	prev_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf),
+					SSDFS_PREV_SB_SEG);
+	cur_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(fsi->sbi_backup.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (prev_leb != cur_leb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("prev_leb %llu doesn't equal to cur_leb %llu\n",
+			  prev_leb, cur_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		err = 0;
+		goto mount_fs_read_only;
+	}
+
+	next_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(fsi->sbi_backup.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	cur_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (next_peb != cur_peb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("next_peb %llu doesn't equal to cur_peb %llu\n",
+			  next_peb, cur_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		err = 0;
+		goto mount_fs_read_only;
+	}
+
+	prev_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf),
+					SSDFS_PREV_SB_SEG);
+	cur_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(fsi->sbi_backup.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (prev_peb != cur_peb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("prev_peb %llu doesn't equal to cur_peb %llu\n",
+			  prev_peb, cur_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		err = 0;
+		goto mount_fs_read_only;
+	}
+
+	next_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(fsi->sbi_backup.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	cur_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (next_leb != cur_leb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("next_leb %llu doesn't equal to cur_leb %llu\n",
+			  next_leb, cur_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		err = 0;
+		goto mount_fs_read_only;
+	}
+
+	prev_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf),
+					SSDFS_PREV_SB_SEG);
+	cur_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(fsi->sbi_backup.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (prev_leb != cur_leb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("prev_leb %llu doesn't equal to cur_leb %llu\n",
+			  prev_leb, cur_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		err = 0;
+		goto mount_fs_read_only;
+	}
+
+	goto try_next_peb;
+
+mount_fs_read_only:
+	SSDFS_NOTICE("unable to mount in RW mode: "
+		     "chain of superblock's segments is broken\n");
+	fsi->sb->s_flags |= SB_RDONLY;
+
+rollback_valid_vh:
+	ssdfs_restore_sb_info(fsi);
+
+end_search:
+	return err;
+}
+
+static inline
+u64 ssdfs_swap_current_sb_peb(struct ssdfs_volume_header *vh, u64 peb)
+{
+	if (peb == SSDFS_MAIN_SB_PEB(vh, SSDFS_CUR_SB_SEG))
+		return SSDFS_COPY_SB_PEB(vh, SSDFS_CUR_SB_SEG);
+	else if (peb == SSDFS_COPY_SB_PEB(vh, SSDFS_CUR_SB_SEG))
+		return SSDFS_MAIN_SB_PEB(vh, SSDFS_CUR_SB_SEG);
+
+	BUG();
+	return ULLONG_MAX;
+}
+
+static inline
+u64 ssdfs_swap_current_sb_leb(struct ssdfs_volume_header *vh, u64 leb)
+{
+	if (leb == SSDFS_MAIN_SB_LEB(vh, SSDFS_CUR_SB_SEG))
+		return SSDFS_COPY_SB_LEB(vh, SSDFS_CUR_SB_SEG);
+	else if (leb == SSDFS_COPY_SB_LEB(vh, SSDFS_CUR_SB_SEG))
+		return SSDFS_MAIN_SB_LEB(vh, SSDFS_CUR_SB_SEG);
+
+	BUG();
+	return ULLONG_MAX;
+}
+
+/*
+ * This method expects that first volume header and log footer
+ * are checked yet and they are valid.
+ */
+static int ssdfs_find_latest_valid_sb_info(struct ssdfs_fs_info *fsi)
+{
+	struct ssdfs_segment_header *last_seg_hdr;
+	u64 leb, peb;
+	u32 cur_off, low_off, high_off;
+	u32 log_pages;
+	u64 pages_per_peb;
+	int err = 0;
+#ifdef CONFIG_SSDFS_DEBUG
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->sbi.vh_buf);
+	BUG_ON(!fsi->devops->read);
+	BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(fsi->sbi.vh_buf)->magic));
+	BUG_ON(!is_ssdfs_volume_header_csum_valid(fsi->sbi.vh_buf, hdr_size));
+
+	SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p\n", fsi, fsi->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_backup_sb_info(fsi);
+	last_seg_hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf);
+	leb = fsi->sbi.last_log.leb_id;
+	peb = fsi->sbi.last_log.peb_id;
+	log_pages = SSDFS_LOG_PAGES(last_seg_hdr);
+
+	if (fsi->is_zns_device)
+		pages_per_peb = div64_u64(fsi->zone_capacity, fsi->pagesize);
+	else
+		pages_per_peb = fsi->pages_per_peb;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(pages_per_peb >= U32_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	low_off = fsi->sbi.last_log.page_offset;
+	high_off = (u32)pages_per_peb;
+	cur_off = low_off + log_pages;
+
+	do {
+		u32 diff_pages, diff_logs;
+		u64 cno1, cno2;
+		u64 copy_leb, copy_peb;
+		u32 peb_pages_off;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(cur_off >= pages_per_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		peb_pages_off = cur_off % (u32)pages_per_peb;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(peb_pages_off > U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (leb == U64_MAX || peb == U64_MAX) {
+			err = -ENODATA;
+			break;
+		}
+
+		err = ssdfs_read_checked_sb_info(fsi, peb,
+						 peb_pages_off, true);
+		cno1 = SSDFS_SEG_CNO(fsi->sbi_backup.vh_buf);
+		cno2 = SSDFS_SEG_CNO(fsi->sbi.vh_buf);
+		if (err == -EIO || cno1 >= cno2) {
+			void *buf = fsi->sbi_backup.vh_buf;
+
+			copy_peb = ssdfs_swap_current_sb_peb(buf, peb);
+			copy_leb = ssdfs_swap_current_sb_leb(buf, leb);
+			if (copy_leb == U64_MAX || copy_peb == U64_MAX) {
+				err = -ERANGE;
+				break;
+			}
+
+			err = ssdfs_read_checked_sb_info(fsi, copy_peb,
+							 peb_pages_off, true);
+			cno1 = SSDFS_SEG_CNO(fsi->sbi_backup.vh_buf);
+			cno2 = SSDFS_SEG_CNO(fsi->sbi.vh_buf);
+			if (!err) {
+				peb = copy_peb;
+				leb = copy_leb;
+				fsi->sbi.last_log.leb_id = leb;
+				fsi->sbi.last_log.peb_id = peb;
+				fsi->sbi.last_log.page_offset = cur_off;
+				fsi->sbi.last_log.pages_count =
+					SSDFS_LOG_PAGES(fsi->sbi.vh_buf);
+			}
+		} else {
+			fsi->sbi.last_log.leb_id = leb;
+			fsi->sbi.last_log.peb_id = peb;
+			fsi->sbi.last_log.page_offset = cur_off;
+			fsi->sbi.last_log.pages_count =
+				SSDFS_LOG_PAGES(fsi->sbi.vh_buf);
+		}
+
+		if (err == -ENODATA || err == -EIO || cno1 >= cno2) {
+			err = !err ? -EIO : err;
+			high_off = cur_off;
+		} else if (err) {
+			/* we have internal error */
+			break;
+		} else {
+			ssdfs_backup_sb_info(fsi);
+			low_off = cur_off;
+		}
+
+		diff_pages = high_off - low_off;
+		diff_logs = (diff_pages / log_pages) / 2;
+		cur_off = low_off + (diff_logs * log_pages);
+	} while (cur_off > low_off && cur_off < high_off);
+
+	if (err) {
+		if (err == -ENODATA || err == -EIO) {
+			/* previous read log was valid */
+			err = 0;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("cur_off %u, low_off %u, high_off %u\n",
+				  cur_off, low_off, high_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+		} else {
+			SSDFS_ERR("fail to find valid volume header: err %d\n",
+				  err);
+		}
+
+		ssdfs_restore_sb_info(fsi);
+	}
+
+	return err;
+}
+
+/*
+ * This method expects that first volume header and log footer
+ * are checked yet and they are valid.
+ */
+static int ssdfs_find_latest_valid_sb_info2(struct ssdfs_fs_info *fsi)
+{
+	struct ssdfs_segment_header *last_seg_hdr;
+	struct ssdfs_peb_extent checking_page;
+	u64 leb, peb;
+	u32 cur_off, low_off, high_off;
+	u32 log_pages;
+	u32 start_offset;
+	u32 found_log_off;
+	u64 cno1, cno2;
+	u64 copy_leb, copy_peb;
+	u32 peb_pages_off;
+	u64 pages_per_peb;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->sbi.vh_buf);
+
+	SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p\n", fsi, fsi->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!fsi->devops->can_write_page) {
+		SSDFS_CRIT("fail to find latest valid sb info: "
+			   "can_write_page is not supported\n");
+		return -EOPNOTSUPP;
+	}
+
+	ssdfs_backup_sb_info(fsi);
+	last_seg_hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf);
+	leb = fsi->sbi.last_log.leb_id;
+	peb = fsi->sbi.last_log.peb_id;
+
+	if (leb == U64_MAX || peb == U64_MAX) {
+		ssdfs_restore_sb_info(fsi);
+		SSDFS_ERR("invalid leb_id %llu or peb_id %llu\n",
+			  leb, peb);
+		return -ERANGE;
+	}
+
+	if (fsi->is_zns_device)
+		pages_per_peb = div64_u64(fsi->zone_capacity, fsi->pagesize);
+	else
+		pages_per_peb = fsi->pages_per_peb;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(pages_per_peb >= U32_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	log_pages = SSDFS_LOG_PAGES(last_seg_hdr);
+	start_offset = fsi->sbi.last_log.page_offset + log_pages;
+	low_off = start_offset;
+	high_off = (u32)pages_per_peb;
+	cur_off = low_off;
+
+	checking_page.leb_id = leb;
+	checking_page.peb_id = peb;
+	checking_page.page_offset = cur_off;
+	checking_page.pages_count = 1;
+
+	err = ssdfs_can_write_sb_log(fsi->sb, &checking_page);
+	if (err == -EIO) {
+		/* correct low bound */
+		err = 0;
+		low_off++;
+	} else if (err) {
+		SSDFS_ERR("fail to check for write PEB %llu\n",
+			  peb);
+		return err;
+	} else {
+		ssdfs_restore_sb_info(fsi);
+
+		/* previous read log was valid */
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("cur_off %u, low_off %u, high_off %u\n",
+			  cur_off, low_off, high_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return 0;
+	}
+
+	cur_off = high_off - 1;
+
+	do {
+		u32 diff_pages;
+
+		checking_page.leb_id = leb;
+		checking_page.peb_id = peb;
+		checking_page.page_offset = cur_off;
+		checking_page.pages_count = 1;
+
+		err = ssdfs_can_write_sb_log(fsi->sb, &checking_page);
+		if (err == -EIO) {
+			/* correct low bound */
+			err = 0;
+			low_off = cur_off;
+		} else if (err) {
+			SSDFS_ERR("fail to check for write PEB %llu\n",
+				  peb);
+			return err;
+		} else {
+			/* correct upper bound */
+			high_off = cur_off;
+		}
+
+		diff_pages = (high_off - low_off) / 2;
+		cur_off = low_off + diff_pages;
+	} while (cur_off > low_off && cur_off < high_off);
+
+	peb_pages_off = cur_off % (u32)pages_per_peb;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(peb_pages_off > U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	found_log_off = cur_off;
+	err = ssdfs_read_checked_sb_info2(fsi, peb, peb_pages_off, true,
+					  &found_log_off);
+	cno1 = SSDFS_SEG_CNO(fsi->sbi_backup.vh_buf);
+	cno2 = SSDFS_SEG_CNO(fsi->sbi.vh_buf);
+
+	if (err == -EIO || cno1 >= cno2) {
+		void *buf = fsi->sbi_backup.vh_buf;
+
+		copy_peb = ssdfs_swap_current_sb_peb(buf, peb);
+		copy_leb = ssdfs_swap_current_sb_leb(buf, leb);
+		if (copy_leb == U64_MAX || copy_peb == U64_MAX) {
+			err = -ERANGE;
+			goto finish_find_latest_sb_info;
+		}
+
+		found_log_off = cur_off;
+		err = ssdfs_read_checked_sb_info2(fsi, copy_peb,
+						  peb_pages_off, true,
+						  &found_log_off);
+		cno1 = SSDFS_SEG_CNO(fsi->sbi_backup.vh_buf);
+		cno2 = SSDFS_SEG_CNO(fsi->sbi.vh_buf);
+		if (!err) {
+			peb = copy_peb;
+			leb = copy_leb;
+			fsi->sbi.last_log.leb_id = leb;
+			fsi->sbi.last_log.peb_id = peb;
+			fsi->sbi.last_log.page_offset = found_log_off;
+			fsi->sbi.last_log.pages_count =
+				SSDFS_LOG_PAGES(fsi->sbi.vh_buf);
+		}
+	} else {
+		fsi->sbi.last_log.leb_id = leb;
+		fsi->sbi.last_log.peb_id = peb;
+		fsi->sbi.last_log.page_offset = found_log_off;
+		fsi->sbi.last_log.pages_count =
+			SSDFS_LOG_PAGES(fsi->sbi.vh_buf);
+	}
+
+finish_find_latest_sb_info:
+	if (err) {
+		if (err == -ENODATA || err == -EIO) {
+			/* previous read log was valid */
+			err = 0;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("cur_off %u, low_off %u, high_off %u\n",
+				  cur_off, low_off, high_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+		} else {
+			SSDFS_ERR("fail to find valid volume header: err %d\n",
+				  err);
+		}
+
+		ssdfs_restore_sb_info(fsi);
+	}
+
+	return err;
+}
+
+static int ssdfs_check_fs_state(struct ssdfs_fs_info *fsi)
+{
+	if (fsi->sb->s_flags & SB_RDONLY)
+		return 0;
+
+	switch (fsi->fs_state) {
+	case SSDFS_MOUNTED_FS:
+		SSDFS_NOTICE("unable to mount in RW mode: "
+			     "file system didn't unmounted cleanly: "
+			     "Please, run fsck utility\n");
+		fsi->sb->s_flags |= SB_RDONLY;
+		return -EROFS;
+
+	case SSDFS_ERROR_FS:
+		if (!ssdfs_test_opt(fsi->mount_opts, IGNORE_FS_STATE)) {
+			SSDFS_NOTICE("unable to mount in RW mode: "
+				     "file system contains errors: "
+				     "Please, run fsck utility\n");
+			fsi->sb->s_flags |= SB_RDONLY;
+			return -EROFS;
+		}
+		break;
+	};
+
+	return 0;
+}
+
+static int ssdfs_check_feature_compatibility(struct ssdfs_fs_info *fsi)
+{
+	u64 features;
+
+	features = fsi->fs_feature_incompat & ~SSDFS_FEATURE_INCOMPAT_SUPP;
+	if (features) {
+		SSDFS_NOTICE("unable to mount: "
+			     "unsupported incompatible features %llu\n",
+			     features);
+		return -EOPNOTSUPP;
+	}
+
+	features = fsi->fs_feature_compat_ro & ~SSDFS_FEATURE_COMPAT_RO_SUPP;
+	if (!(fsi->sb->s_flags & SB_RDONLY) && features) {
+		SSDFS_NOTICE("unable to mount in RW mode: "
+			     "unsupported RO compatible features %llu\n",
+			     features);
+		fsi->sb->s_flags |= SB_RDONLY;
+		return -EROFS;
+	}
+
+	features = fsi->fs_feature_compat & ~SSDFS_FEATURE_COMPAT_SUPP;
+	if (features)
+		SSDFS_WARN("unknown compatible features %llu\n", features);
+
+	return 0;
+}
+
+static inline void ssdfs_init_sb_segs_array(struct ssdfs_fs_info *fsi)
+{
+	int i, j;
+
+	for (i = SSDFS_CUR_SB_SEG; i < SSDFS_SB_CHAIN_MAX; i++) {
+		for (j = SSDFS_MAIN_SB_SEG; j < SSDFS_SB_SEG_COPY_MAX; j++) {
+			fsi->sb_lebs[i][j] =
+				le64_to_cpu(fsi->vh->sb_pebs[i][j].leb_id);
+			fsi->sb_pebs[i][j] =
+				le64_to_cpu(fsi->vh->sb_pebs[i][j].peb_id);
+		}
+	}
+}
+
+static int ssdfs_initialize_fs_info(struct ssdfs_fs_info *fsi)
+{
+	int err;
+
+	init_rwsem(&fsi->volume_sem);
+
+	fsi->vh = SSDFS_VH(fsi->sbi.vh_buf);
+	fsi->vs = SSDFS_VS(fsi->sbi.vs_buf);
+
+	fsi->sb_seg_log_pages = le16_to_cpu(fsi->vh->sb_seg_log_pages);
+	fsi->segbmap_log_pages = le16_to_cpu(fsi->vh->segbmap_log_pages);
+	fsi->maptbl_log_pages = le16_to_cpu(fsi->vh->maptbl_log_pages);
+	fsi->lnodes_seg_log_pages = le16_to_cpu(fsi->vh->lnodes_seg_log_pages);
+	fsi->hnodes_seg_log_pages = le16_to_cpu(fsi->vh->hnodes_seg_log_pages);
+	fsi->inodes_seg_log_pages = le16_to_cpu(fsi->vh->inodes_seg_log_pages);
+	fsi->user_data_log_pages = le16_to_cpu(fsi->vh->user_data_log_pages);
+
+	/* Static volume information */
+	fsi->log_pagesize = fsi->vh->log_pagesize;
+	fsi->pagesize = 1 << fsi->vh->log_pagesize;
+	fsi->log_erasesize = fsi->vh->log_erasesize;
+	fsi->log_segsize = fsi->vh->log_segsize;
+	fsi->segsize = 1 << fsi->vh->log_segsize;
+	fsi->log_pebs_per_seg = fsi->vh->log_pebs_per_seg;
+	fsi->pebs_per_seg = 1 << fsi->vh->log_pebs_per_seg;
+	fsi->pages_per_peb = fsi->erasesize / fsi->pagesize;
+	fsi->pages_per_seg = fsi->segsize / fsi->pagesize;
+	fsi->lebs_per_peb_index = le32_to_cpu(fsi->vh->lebs_per_peb_index);
+
+	if (fsi->is_zns_device) {
+		u64 peb_pages_capacity =
+			fsi->zone_capacity >> fsi->vh->log_pagesize;
+
+		fsi->erasesize = fsi->zone_size;
+		fsi->segsize = fsi->erasesize *
+				le16_to_cpu(fsi->vh->pebs_per_seg);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(peb_pages_capacity >= U32_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		fsi->peb_pages_capacity = (u32)peb_pages_capacity;
+		atomic_set(&fsi->open_zones, le32_to_cpu(fsi->vs->open_zones));
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("open_zones %d\n",
+			  atomic_read(&fsi->open_zones));
+#endif /* CONFIG_SSDFS_DEBUG */
+	} else {
+		fsi->erasesize = 1 << fsi->vh->log_erasesize;
+		fsi->segsize = 1 << fsi->vh->log_segsize;
+		fsi->peb_pages_capacity = fsi->pages_per_peb;
+	}
+
+	if (fsi->pages_per_peb > U16_MAX)
+		fsi->leb_pages_capacity = U16_MAX;
+	else
+		fsi->leb_pages_capacity = fsi->pages_per_peb;
+
+	fsi->fs_ctime = le64_to_cpu(fsi->vh->create_time);
+	fsi->fs_cno = le64_to_cpu(fsi->vh->create_cno);
+	fsi->raw_inode_size = le16_to_cpu(fsi->vs->inodes_btree.desc.item_size);
+	fsi->create_threads_per_seg =
+				le16_to_cpu(fsi->vh->create_threads_per_seg);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("STATIC VOLUME INFO:\n");
+	SSDFS_DBG("pagesize %u, erasesize %u, segsize %u\n",
+		  fsi->pagesize, fsi->erasesize, fsi->segsize);
+	SSDFS_DBG("pebs_per_seg %u, pages_per_peb %u, "
+		  "pages_per_seg %u, lebs_per_peb_index %u\n",
+		  fsi->pebs_per_seg, fsi->pages_per_peb,
+		  fsi->pages_per_seg, fsi->lebs_per_peb_index);
+	SSDFS_DBG("zone_size %llu, zone_capacity %llu, "
+		  "leb_pages_capacity %u, peb_pages_capacity %u, "
+		  "open_zones %d\n",
+		  fsi->zone_size, fsi->zone_capacity,
+		  fsi->leb_pages_capacity, fsi->peb_pages_capacity,
+		  atomic_read(&fsi->open_zones));
+	SSDFS_DBG("fs_ctime %llu, fs_cno %llu, "
+		  "raw_inode_size %u, create_threads_per_seg %u\n",
+		  (u64)fsi->fs_ctime, (u64)fsi->fs_cno,
+		  fsi->raw_inode_size,
+		  fsi->create_threads_per_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	/* Mutable volume info */
+	init_rwsem(&fsi->sb_segs_sem);
+	ssdfs_init_sb_segs_array(fsi);
+
+	mutex_init(&fsi->resize_mutex);
+	fsi->nsegs = le64_to_cpu(fsi->vs->nsegs);
+
+	spin_lock_init(&fsi->volume_state_lock);
+
+	fsi->free_pages = 0;
+	fsi->reserved_new_user_data_pages = 0;
+	fsi->updated_user_data_pages = 0;
+	fsi->flushing_user_data_requests = 0;
+	fsi->fs_mount_time = ssdfs_current_timestamp();
+	fsi->fs_mod_time = le64_to_cpu(fsi->vs->timestamp);
+	ssdfs_init_boot_vs_mount_timediff(fsi);
+	fsi->fs_mount_cno = le64_to_cpu(fsi->vs->cno);
+	fsi->fs_flags = le32_to_cpu(fsi->vs->flags);
+	fsi->fs_state = le16_to_cpu(fsi->vs->state);
+
+	fsi->fs_errors = le16_to_cpu(fsi->vs->errors);
+	ssdfs_initialize_fs_errors_option(fsi);
+
+	fsi->fs_feature_compat = le64_to_cpu(fsi->vs->feature_compat);
+	fsi->fs_feature_compat_ro = le64_to_cpu(fsi->vs->feature_compat_ro);
+	fsi->fs_feature_incompat = le64_to_cpu(fsi->vs->feature_incompat);
+
+	ssdfs_memcpy(fsi->fs_uuid, 0, SSDFS_UUID_SIZE,
+		     fsi->vs->uuid, 0, SSDFS_UUID_SIZE,
+		     SSDFS_UUID_SIZE);
+	ssdfs_memcpy(fsi->fs_label, 0, SSDFS_VOLUME_LABEL_MAX,
+		     fsi->vs->label, 0, SSDFS_VOLUME_LABEL_MAX,
+		     SSDFS_VOLUME_LABEL_MAX);
+
+	fsi->metadata_options.blk_bmap.flags =
+				le16_to_cpu(fsi->vs->blkbmap.flags);
+	fsi->metadata_options.blk_bmap.compression =
+					fsi->vs->blkbmap.compression;
+	fsi->metadata_options.blk2off_tbl.flags =
+				le16_to_cpu(fsi->vs->blk2off_tbl.flags);
+	fsi->metadata_options.blk2off_tbl.compression =
+					fsi->vs->blk2off_tbl.compression;
+	fsi->metadata_options.user_data.flags =
+				le16_to_cpu(fsi->vs->user_data.flags);
+	fsi->metadata_options.user_data.compression =
+					fsi->vs->user_data.compression;
+	fsi->metadata_options.user_data.migration_threshold =
+			le16_to_cpu(fsi->vs->user_data.migration_threshold);
+
+	fsi->migration_threshold = le16_to_cpu(fsi->vs->migration_threshold);
+	if (fsi->migration_threshold == 0 ||
+	    fsi->migration_threshold >= U16_MAX) {
+		/* use default value */
+		fsi->migration_threshold = fsi->pebs_per_seg;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("MUTABLE VOLUME INFO:\n");
+	SSDFS_DBG("sb_lebs[CUR][MAIN] %llu, sb_pebs[CUR][MAIN] %llu\n",
+		  fsi->sb_lebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG],
+		  fsi->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG]);
+	SSDFS_DBG("sb_lebs[CUR][COPY] %llu, sb_pebs[CUR][COPY] %llu\n",
+		  fsi->sb_lebs[SSDFS_CUR_SB_SEG][SSDFS_COPY_SB_SEG],
+		  fsi->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_COPY_SB_SEG]);
+	SSDFS_DBG("sb_lebs[NEXT][MAIN] %llu, sb_pebs[NEXT][MAIN] %llu\n",
+		  fsi->sb_lebs[SSDFS_NEXT_SB_SEG][SSDFS_MAIN_SB_SEG],
+		  fsi->sb_pebs[SSDFS_NEXT_SB_SEG][SSDFS_MAIN_SB_SEG]);
+	SSDFS_DBG("sb_lebs[NEXT][COPY] %llu, sb_pebs[NEXT][COPY] %llu\n",
+		  fsi->sb_lebs[SSDFS_NEXT_SB_SEG][SSDFS_COPY_SB_SEG],
+		  fsi->sb_pebs[SSDFS_NEXT_SB_SEG][SSDFS_COPY_SB_SEG]);
+	SSDFS_DBG("sb_lebs[RESERVED][MAIN] %llu, sb_pebs[RESERVED][MAIN] %llu\n",
+		  fsi->sb_lebs[SSDFS_RESERVED_SB_SEG][SSDFS_MAIN_SB_SEG],
+		  fsi->sb_pebs[SSDFS_RESERVED_SB_SEG][SSDFS_MAIN_SB_SEG]);
+	SSDFS_DBG("sb_lebs[RESERVED][COPY] %llu, sb_pebs[RESERVED][COPY] %llu\n",
+		  fsi->sb_lebs[SSDFS_RESERVED_SB_SEG][SSDFS_COPY_SB_SEG],
+		  fsi->sb_pebs[SSDFS_RESERVED_SB_SEG][SSDFS_COPY_SB_SEG]);
+	SSDFS_DBG("sb_lebs[PREV][MAIN] %llu, sb_pebs[PREV][MAIN] %llu\n",
+		  fsi->sb_lebs[SSDFS_PREV_SB_SEG][SSDFS_MAIN_SB_SEG],
+		  fsi->sb_pebs[SSDFS_PREV_SB_SEG][SSDFS_MAIN_SB_SEG]);
+	SSDFS_DBG("sb_lebs[PREV][COPY] %llu, sb_pebs[PREV][COPY] %llu\n",
+		  fsi->sb_lebs[SSDFS_PREV_SB_SEG][SSDFS_COPY_SB_SEG],
+		  fsi->sb_pebs[SSDFS_PREV_SB_SEG][SSDFS_COPY_SB_SEG]);
+	SSDFS_DBG("nsegs %llu, free_pages %llu\n",
+		  fsi->nsegs, fsi->free_pages);
+	SSDFS_DBG("fs_mount_time %llu, fs_mod_time %llu, fs_mount_cno %llu\n",
+		  fsi->fs_mount_time, fsi->fs_mod_time, fsi->fs_mount_cno);
+	SSDFS_DBG("fs_flags %#x, fs_state %#x, fs_errors %#x\n",
+		  fsi->fs_flags, fsi->fs_state, fsi->fs_errors);
+	SSDFS_DBG("fs_feature_compat %llu, fs_feature_compat_ro %llu, "
+		  "fs_feature_incompat %llu\n",
+		  fsi->fs_feature_compat, fsi->fs_feature_compat_ro,
+		  fsi->fs_feature_incompat);
+	SSDFS_DBG("migration_threshold %u\n",
+		  fsi->migration_threshold);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi->sb->s_blocksize = fsi->pagesize;
+	fsi->sb->s_blocksize_bits = blksize_bits(fsi->pagesize);
+
+	ssdfs_maptbl_cache_init(&fsi->maptbl_cache);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("VOLUME HEADER DUMP\n");
+	print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+			     fsi->vh, fsi->pagesize);
+	SSDFS_DBG("END\n");
+
+	SSDFS_DBG("VOLUME STATE DUMP\n");
+	print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+			     fsi->vs, fsi->pagesize);
+	SSDFS_DBG("END\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_check_fs_state(fsi);
+	if (err && err != -EROFS)
+		return err;
+
+	err = ssdfs_check_feature_compatibility(fsi);
+	if (err)
+		return err;
+
+	if (fsi->leb_pages_capacity >= U16_MAX) {
+#ifdef CONFIG_SSDFS_TESTING
+		SSDFS_DBG("Continue in testing mode: "
+			  "leb_pages_capacity %u, peb_pages_capacity %u\n",
+			  fsi->leb_pages_capacity,
+			  fsi->peb_pages_capacity);
+		return 0;
+#else
+		SSDFS_NOTICE("unable to mount in RW mode: "
+			     "Please, format volume with bigger logical block size.\n");
+		SSDFS_NOTICE("STATIC VOLUME INFO:\n");
+		SSDFS_NOTICE("pagesize %u, erasesize %u, segsize %u\n",
+			     fsi->pagesize, fsi->erasesize, fsi->segsize);
+		SSDFS_NOTICE("pebs_per_seg %u, pages_per_peb %u, "
+			     "pages_per_seg %u\n",
+			     fsi->pebs_per_seg, fsi->pages_per_peb,
+			     fsi->pages_per_seg);
+		SSDFS_NOTICE("zone_size %llu, zone_capacity %llu, "
+			     "leb_pages_capacity %u, peb_pages_capacity %u\n",
+			     fsi->zone_size, fsi->zone_capacity,
+			     fsi->leb_pages_capacity, fsi->peb_pages_capacity);
+
+		fsi->sb->s_flags |= SB_RDONLY;
+		return -EROFS;
+#endif /* CONFIG_SSDFS_TESTING */
+	}
+
+	return 0;
+}
+
+static
+int ssdfs_check_maptbl_cache_header(struct ssdfs_maptbl_cache_header *hdr,
+				    u16 sequence_id,
+				    u64 prev_end_leb)
+{
+	size_t bytes_count, calculated;
+	u64 start_leb, end_leb;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!hdr);
+
+	SSDFS_DBG("maptbl_cache_hdr %p\n", hdr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (hdr->magic.common != cpu_to_le32(SSDFS_SUPER_MAGIC) ||
+	    hdr->magic.key != cpu_to_le16(SSDFS_MAPTBL_CACHE_MAGIC)) {
+		SSDFS_ERR("invalid maptbl cache magic signature\n");
+		return -EIO;
+	}
+
+	if (le16_to_cpu(hdr->sequence_id) != sequence_id) {
+		SSDFS_ERR("invalid sequence_id\n");
+		return -EIO;
+	}
+
+	bytes_count = le16_to_cpu(hdr->bytes_count);
+
+	if (bytes_count > PAGE_SIZE) {
+		SSDFS_ERR("invalid bytes_count %zu\n",
+			  bytes_count);
+		return -EIO;
+	}
+
+	calculated = le16_to_cpu(hdr->items_count) *
+			sizeof(struct ssdfs_leb2peb_pair);
+
+	if (bytes_count < calculated) {
+		SSDFS_ERR("bytes_count %zu < calculated %zu\n",
+			  bytes_count, calculated);
+		return -EIO;
+	}
+
+	start_leb = le64_to_cpu(hdr->start_leb);
+	end_leb = le64_to_cpu(hdr->end_leb);
+
+	if (start_leb > end_leb ||
+	    (prev_end_leb != U64_MAX && prev_end_leb >= start_leb)) {
+		SSDFS_ERR("invalid LEB range: start_leb %llu, "
+			  "end_leb %llu, prev_end_leb %llu\n",
+			  start_leb, end_leb, prev_end_leb);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int ssdfs_read_maptbl_cache(struct ssdfs_fs_info *fsi)
+{
+	struct ssdfs_segment_header *seg_hdr;
+	struct ssdfs_metadata_descriptor *meta_desc;
+	struct ssdfs_maptbl_cache_header *maptbl_cache_hdr;
+	u32 read_off;
+	u32 read_bytes = 0;
+	u32 bytes_count;
+	u32 pages_count;
+	u64 peb_id;
+	struct page *page;
+	void *kaddr;
+	u64 prev_end_leb;
+	u32 csum = ~0;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->devops->read);
+
+	SSDFS_DBG("fsi %p\n", fsi);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	seg_hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf);
+
+	if (!ssdfs_log_has_maptbl_cache(seg_hdr)) {
+		SSDFS_ERR("sb segment hasn't maptbl cache\n");
+		return -EIO;
+	}
+
+	down_write(&fsi->maptbl_cache.lock);
+
+	meta_desc = &seg_hdr->desc_array[SSDFS_MAPTBL_CACHE_INDEX];
+	read_off = le32_to_cpu(meta_desc->offset);
+	bytes_count = le32_to_cpu(meta_desc->size);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(bytes_count >= INT_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	peb_id = fsi->sbi.last_log.peb_id;
+
+	pages_count = (bytes_count + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+	for (i = 0; i < pages_count; i++) {
+		struct ssdfs_maptbl_cache *cache = &fsi->maptbl_cache;
+		size_t size;
+
+		size = min_t(size_t, (size_t)PAGE_SIZE,
+				(size_t)(bytes_count - read_bytes));
+
+		page = ssdfs_maptbl_cache_add_pagevec_page(cache);
+		if (unlikely(IS_ERR_OR_NULL(page))) {
+			err = !page ? -ENOMEM : PTR_ERR(page);
+			SSDFS_ERR("fail to add pagevec page: err %d\n",
+				  err);
+			goto finish_read_maptbl_cache;
+		}
+
+		ssdfs_lock_page(page);
+
+		kaddr = kmap_local_page(page);
+		err = ssdfs_unaligned_read_buffer(fsi, peb_id,
+						  read_off, kaddr, size);
+		flush_dcache_page(page);
+		kunmap_local(kaddr);
+
+		if (unlikely(err)) {
+			ssdfs_unlock_page(page);
+			SSDFS_ERR("fail to read page: "
+				  "peb %llu, offset %u, size %zu, err %d\n",
+				  peb_id, read_off, size, err);
+			goto finish_read_maptbl_cache;
+		}
+
+		ssdfs_unlock_page(page);
+
+		read_off += size;
+		read_bytes += size;
+	}
+
+	prev_end_leb = U64_MAX;
+
+	for (i = 0; i < pages_count; i++) {
+		page = fsi->maptbl_cache.pvec.pages[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(i >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_lock_page(page);
+		kaddr = kmap_local_page(page);
+
+		maptbl_cache_hdr = SSDFS_MAPTBL_CACHE_HDR(kaddr);
+
+		err = ssdfs_check_maptbl_cache_header(maptbl_cache_hdr,
+						      (u16)i,
+						      prev_end_leb);
+		if (unlikely(err)) {
+			SSDFS_ERR("invalid maptbl cache header: "
+				  "page_index %d, err %d\n",
+				  i, err);
+			goto unlock_cur_page;
+		}
+
+		prev_end_leb = le64_to_cpu(maptbl_cache_hdr->end_leb);
+
+		csum = crc32(csum, kaddr,
+			     le16_to_cpu(maptbl_cache_hdr->bytes_count));
+
+unlock_cur_page:
+		kunmap_local(kaddr);
+		ssdfs_unlock_page(page);
+
+		if (unlikely(err))
+			goto finish_read_maptbl_cache;
+	}
+
+	if (csum != le32_to_cpu(meta_desc->check.csum)) {
+		err = -EIO;
+		SSDFS_ERR("invalid checksum\n");
+		goto finish_read_maptbl_cache;
+	}
+
+	if (bytes_count < PAGE_SIZE)
+		bytes_count = PAGE_SIZE;
+
+	atomic_set(&fsi->maptbl_cache.bytes_count, (int)bytes_count);
+
+finish_read_maptbl_cache:
+	up_write(&fsi->maptbl_cache.lock);
+
+	return err;
+}
+
+static inline bool is_ssdfs_snapshot_rules_exist(struct ssdfs_fs_info *fsi)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return ssdfs_log_footer_has_snapshot_rules(SSDFS_LF(fsi->vs));
+}
+
+static inline
+int ssdfs_check_snapshot_rules_header(struct ssdfs_snapshot_rules_header *hdr)
+{
+	size_t item_size = sizeof(struct ssdfs_snapshot_rule_info);
+	u16 items_count;
+	u16 items_capacity;
+	u32 area_size;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!hdr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (le32_to_cpu(hdr->magic) != SSDFS_SNAPSHOT_RULES_MAGIC) {
+		SSDFS_ERR("invalid snapshot rules magic %#x\n",
+			  le32_to_cpu(hdr->magic));
+		return -EIO;
+	}
+
+	if (le16_to_cpu(hdr->item_size) != item_size) {
+		SSDFS_ERR("invalid item size %u\n",
+			  le16_to_cpu(hdr->item_size));
+		return -EIO;
+	}
+
+	items_count = le16_to_cpu(hdr->items_count);
+	items_capacity = le16_to_cpu(hdr->items_capacity);
+
+	if (items_count > items_capacity) {
+		SSDFS_ERR("corrupted header: "
+			  "items_count %u > items_capacity %u\n",
+			  items_count, items_capacity);
+		return -EIO;
+	}
+
+	area_size = le32_to_cpu(hdr->area_size);
+
+	if (area_size != ((u32)items_capacity * item_size)) {
+		SSDFS_ERR("corrupted header: "
+			  "area_size %u, items_capacity %u, "
+			  "item_size %zu\n",
+			  area_size, items_capacity, item_size);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static inline int ssdfs_read_snapshot_rules(struct ssdfs_fs_info *fsi)
+{
+	struct ssdfs_log_footer *footer;
+	struct ssdfs_snapshot_rules_list *rules_list;
+	struct ssdfs_metadata_descriptor *meta_desc;
+	struct ssdfs_snapshot_rules_header snap_rules_hdr;
+	size_t sr_hdr_size = sizeof(struct ssdfs_snapshot_rules_header);
+	struct ssdfs_snapshot_rule_info info;
+	size_t rule_size = sizeof(struct ssdfs_snapshot_rule_info);
+	struct pagevec pvec;
+	u32 read_off;
+	u32 read_bytes = 0;
+	u32 bytes_count;
+	u32 pages_count;
+	u64 peb_id;
+	struct page *page;
+	void *kaddr;
+	u32 csum = ~0;
+	u16 items_count;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->devops->read);
+
+	SSDFS_DBG("fsi %p\n", fsi);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	footer = SSDFS_LF(fsi->sbi.vs_buf);
+	rules_list = &fsi->snapshots.rules_list;
+
+	if (!ssdfs_log_footer_has_snapshot_rules(footer)) {
+		SSDFS_ERR("footer hasn't snapshot rules table\n");
+		return -EIO;
+	}
+
+	meta_desc = &footer->desc_array[SSDFS_SNAPSHOT_RULES_AREA_INDEX];
+	read_off = le32_to_cpu(meta_desc->offset);
+	bytes_count = le32_to_cpu(meta_desc->size);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(bytes_count >= INT_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	peb_id = fsi->sbi.last_log.peb_id;
+
+	pages_count = (bytes_count + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	pagevec_init(&pvec);
+
+	for (i = 0; i < pages_count; i++) {
+		size_t size;
+
+		size = min_t(size_t, (size_t)PAGE_SIZE,
+				(size_t)(bytes_count - read_bytes));
+
+		page = ssdfs_snapshot_rules_add_pagevec_page(&pvec);
+		if (unlikely(IS_ERR_OR_NULL(page))) {
+			err = !page ? -ENOMEM : PTR_ERR(page);
+			SSDFS_ERR("fail to add pagevec page: err %d\n",
+				  err);
+			goto finish_read_snapshot_rules;
+		}
+
+		ssdfs_lock_page(page);
+
+		kaddr = kmap_local_page(page);
+		err = ssdfs_unaligned_read_buffer(fsi, peb_id,
+						  read_off, kaddr, size);
+		flush_dcache_page(page);
+		kunmap_local(kaddr);
+
+		if (unlikely(err)) {
+			ssdfs_unlock_page(page);
+			SSDFS_ERR("fail to read page: "
+				  "peb %llu, offset %u, size %zu, err %d\n",
+				  peb_id, read_off, size, err);
+			goto finish_read_snapshot_rules;
+		}
+
+		ssdfs_unlock_page(page);
+
+		read_off += size;
+		read_bytes += size;
+	}
+
+	page = pvec.pages[0];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_lock_page(page);
+	ssdfs_memcpy_from_page(&snap_rules_hdr, 0, sr_hdr_size,
+				page, 0, PAGE_SIZE,
+				sr_hdr_size);
+	ssdfs_unlock_page(page);
+
+	err = ssdfs_check_snapshot_rules_header(&snap_rules_hdr);
+	if (unlikely(err)) {
+		SSDFS_ERR("invalid snapshot rules header: "
+			  "err %d\n", err);
+		goto finish_read_snapshot_rules;
+	}
+
+	for (i = 0; i < pages_count; i++) {
+		page = pvec.pages[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(i >= U16_MAX);
+		BUG_ON(!page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_lock_page(page);
+		kaddr = kmap_local_page(page);
+		csum = crc32(csum, kaddr, le16_to_cpu(meta_desc->check.bytes));
+		kunmap_local(kaddr);
+		ssdfs_unlock_page(page);
+	}
+
+	if (csum != le32_to_cpu(meta_desc->check.csum)) {
+		err = -EIO;
+		SSDFS_ERR("invalid checksum\n");
+		goto finish_read_snapshot_rules;
+	}
+
+	items_count = le16_to_cpu(snap_rules_hdr.items_count);
+	read_off = sr_hdr_size;
+
+	for (i = 0; i < items_count; i++) {
+		struct ssdfs_snapshot_rule_item *ptr;
+
+		err = ssdfs_unaligned_read_pagevec(&pvec, read_off,
+						   rule_size, &info);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to read a snapshot rule: "
+				  "read_off %u, index %d, err %d\n",
+				  read_off, i, err);
+			goto finish_read_snapshot_rules;
+		}
+
+		ptr = ssdfs_snapshot_rule_alloc();
+		if (!ptr) {
+			err = -ENOMEM;
+			SSDFS_ERR("fail to allocate rule item\n");
+			goto finish_read_snapshot_rules;
+		}
+
+		ssdfs_memcpy(&ptr->rule, 0, rule_size,
+			     &info, 0, rule_size,
+			     rule_size);
+
+		ssdfs_snapshot_rules_list_add_tail(rules_list, ptr);
+
+		read_off += rule_size;
+	}
+
+finish_read_snapshot_rules:
+	ssdfs_snapshot_rules_pagevec_release(&pvec);
+	return err;
+}
+
+static int ssdfs_init_recovery_environment(struct ssdfs_fs_info *fsi,
+					   struct ssdfs_volume_header *vh,
+					   u64 pebs_per_volume,
+					   struct ssdfs_recovery_env *env)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !vh || !env);
+
+	SSDFS_DBG("fsi %p, vh %p, env %p\n", fsi, vh, env);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	env->found = NULL;
+	env->err = 0;
+	env->fsi = fsi;
+	env->pebs_per_volume = pebs_per_volume;
+
+	atomic_set(&env->state, SSDFS_RECOVERY_UNKNOWN_STATE);
+
+	err = ssdfs_init_sb_info(fsi, &env->sbi);
+	if (likely(!err))
+		err = ssdfs_init_sb_info(fsi, &env->sbi_backup);
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to prepare sb info: err %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+static inline bool has_thread_finished(struct ssdfs_recovery_env *env)
+{
+	switch (atomic_read(&env->state)) {
+	case SSDFS_RECOVERY_FAILED:
+	case SSDFS_RECOVERY_FINISHED:
+		return true;
+
+	case SSDFS_START_RECOVERY:
+		return false;
+	}
+
+	return true;
+}
+
+static inline u16 ssdfs_get_pebs_per_stripe(u64 pebs_per_volume,
+					    u64 processed_pebs,
+					    u32 fragments_count,
+					    u16 pebs_per_fragment,
+					    u16 stripes_per_fragment,
+					    u16 pebs_per_stripe)
+{
+	u64 fragment_index;
+	u64 pebs_per_aligned_fragments;
+	u64 pebs_per_last_fragment;
+	u64 calculated = U16_MAX;
+	u32 remainder;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("pebs_per_volume %llu, processed_pebs %llu, "
+		  "fragments_count %u, pebs_per_fragment %u, "
+		  "stripes_per_fragment %u, pebs_per_stripe %u\n",
+		  pebs_per_volume, processed_pebs,
+		  fragments_count, pebs_per_fragment,
+		  stripes_per_fragment, pebs_per_stripe);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (fragments_count == 0) {
+		SSDFS_WARN("invalid fragments_count %u\n",
+			   fragments_count);
+		return pebs_per_stripe;
+	}
+
+	fragment_index = processed_pebs / pebs_per_fragment;
+
+	if (fragment_index >= fragments_count) {
+		SSDFS_WARN("fragment_index %llu >= fragments_count %u\n",
+			   fragment_index, fragments_count);
+		return pebs_per_stripe;
+	}
+
+	if ((fragment_index + 1) < fragments_count)
+		calculated = pebs_per_stripe;
+	else {
+		pebs_per_aligned_fragments = fragments_count - 1;
+		pebs_per_aligned_fragments *= pebs_per_fragment;
+
+		if (pebs_per_aligned_fragments >= pebs_per_volume) {
+			SSDFS_WARN("calculated %llu >= pebs_per_volume %llu\n",
+				   pebs_per_aligned_fragments,
+				   pebs_per_volume);
+			return 0;
+		}
+
+		pebs_per_last_fragment = pebs_per_volume -
+						pebs_per_aligned_fragments;
+		calculated = pebs_per_last_fragment / stripes_per_fragment;
+
+		div_u64_rem(pebs_per_last_fragment,
+			    (u64)stripes_per_fragment, &remainder);
+
+		if (remainder != 0)
+			calculated++;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("calculated: fragment_index %llu, pebs_per_stripe %llu\n",
+		  fragment_index, calculated);
+
+	BUG_ON(calculated > pebs_per_stripe);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (u16)calculated;
+}
+
+static inline
+void ssdfs_init_found_pebs_details(struct ssdfs_found_protected_pebs *ptr)
+{
+	int i, j;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!ptr);
+
+	SSDFS_DBG("ptr %p\n", ptr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ptr->start_peb = U64_MAX;
+	ptr->pebs_count = U32_MAX;
+	ptr->lower_offset = U64_MAX;
+	ptr->middle_offset = U64_MAX;
+	ptr->upper_offset = U64_MAX;
+	ptr->current_offset = U64_MAX;
+	ptr->search_phase = SSDFS_RECOVERY_NO_SEARCH;
+
+	for (i = 0; i < SSDFS_PROTECTED_PEB_CHAIN_MAX; i++) {
+		struct ssdfs_found_protected_peb *cur_peb;
+
+		cur_peb = &ptr->array[i];
+
+		cur_peb->peb.peb_id = U64_MAX;
+		cur_peb->peb.is_superblock_peb = false;
+		cur_peb->peb.state = SSDFS_PEB_NOT_CHECKED;
+
+		for (j = 0; j < SSDFS_SB_CHAIN_MAX; j++) {
+			struct ssdfs_superblock_pebs_pair *cur_pair;
+			struct ssdfs_found_peb *cur_sb_peb;
+
+			cur_pair = &cur_peb->found.sb_pebs[j];
+
+			cur_sb_peb = &cur_pair->pair[SSDFS_MAIN_SB_SEG];
+			cur_sb_peb->peb_id = U64_MAX;
+			cur_sb_peb->is_superblock_peb = false;
+			cur_sb_peb->state = SSDFS_PEB_NOT_CHECKED;
+
+			cur_sb_peb = &cur_pair->pair[SSDFS_COPY_SB_SEG];
+			cur_sb_peb->peb_id = U64_MAX;
+			cur_sb_peb->is_superblock_peb = false;
+			cur_sb_peb->state = SSDFS_PEB_NOT_CHECKED;
+		}
+	}
+}
+
+static inline
+int ssdfs_start_recovery_thread_activity(struct ssdfs_recovery_env *env,
+				struct ssdfs_found_protected_pebs *found,
+				u64 start_peb, u32 pebs_count, int search_phase)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || env->found || !found);
+
+	SSDFS_DBG("env %p, found %p, start_peb %llu, "
+		  "pebs_count %u, search_phase %#x\n",
+		  env, found, start_peb,
+		  pebs_count, search_phase);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	env->found = found;
+	env->err = 0;
+
+	if (search_phase == SSDFS_RECOVERY_FAST_SEARCH) {
+		env->found->start_peb = start_peb;
+		env->found->pebs_count = pebs_count;
+	} else if (search_phase == SSDFS_RECOVERY_SLOW_SEARCH) {
+		struct ssdfs_found_protected_peb *protected;
+		u64 lower_peb_id;
+		u64 upper_peb_id;
+		u64 last_cno_peb_id;
+
+		if (env->found->start_peb != start_peb ||
+		    env->found->pebs_count != pebs_count) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("ignore search in fragment: "
+				  "found (start_peb %llu, pebs_count %u), "
+				  "start_peb %llu, pebs_count %u\n",
+				  env->found->start_peb,
+				  env->found->pebs_count,
+				  start_peb, pebs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+			env->err = -ENODATA;
+			atomic_set(&env->state, SSDFS_RECOVERY_FAILED);
+			return -ENODATA;
+		}
+
+		protected = &env->found->array[SSDFS_LOWER_PEB_INDEX];
+		lower_peb_id = protected->peb.peb_id;
+
+		protected = &env->found->array[SSDFS_UPPER_PEB_INDEX];
+		upper_peb_id = protected->peb.peb_id;
+
+		protected = &env->found->array[SSDFS_LAST_CNO_PEB_INDEX];
+		last_cno_peb_id = protected->peb.peb_id;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("protected PEBs: "
+			  "lower %llu, upper %llu, last_cno_peb %llu\n",
+			  lower_peb_id, upper_peb_id, last_cno_peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (lower_peb_id >= U64_MAX) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("ignore search in fragment: "
+				  "found (start_peb %llu, pebs_count %u), "
+				  "start_peb %llu, pebs_count %u, "
+				  "lower %llu, upper %llu, "
+				  "last_cno_peb %llu\n",
+				  env->found->start_peb,
+				  env->found->pebs_count,
+				  start_peb, pebs_count,
+				  lower_peb_id, upper_peb_id,
+				  last_cno_peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			env->err = -ENODATA;
+			atomic_set(&env->state, SSDFS_RECOVERY_FAILED);
+			return -ENODATA;
+		} else if (lower_peb_id == env->found->start_peb &&
+			   upper_peb_id >= U64_MAX) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("ignore search in fragment: "
+				  "found (start_peb %llu, pebs_count %u), "
+				  "start_peb %llu, pebs_count %u, "
+				  "lower %llu, upper %llu, "
+				  "last_cno_peb %llu\n",
+				  env->found->start_peb,
+				  env->found->pebs_count,
+				  start_peb, pebs_count,
+				  lower_peb_id, upper_peb_id,
+				  last_cno_peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			env->err = -ENODATA;
+			atomic_set(&env->state, SSDFS_RECOVERY_FAILED);
+			return -ENODATA;
+		}
+	} else {
+		SSDFS_ERR("unexpected search phase %#x\n",
+			  search_phase);
+		return -ERANGE;
+	}
+
+	env->found->search_phase = search_phase;
+	atomic_set(&env->state, SSDFS_START_RECOVERY);
+	wake_up(&env->request_wait_queue);
+
+	return 0;
+}
+
+static inline
+int ssdfs_wait_recovery_thread_finish(struct ssdfs_fs_info *fsi,
+				       struct ssdfs_recovery_env *env,
+				       u32 stripe_id,
+				       bool *has_sb_peb_found)
+{
+	struct ssdfs_segment_header *seg_hdr;
+	wait_queue_head_t *wq;
+	u64 cno1, cno2;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !has_sb_peb_found);
+
+	SSDFS_DBG("env %p, has_sb_peb_found %p, stripe_id %u\n",
+		  env, has_sb_peb_found, stripe_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	/*
+	 * Do not change has_sb_peb_found
+	 * if nothing has been found!!!!
+	 */
+
+	wq = &env->result_wait_queue;
+
+	wait_event_interruptible_timeout(*wq,
+			has_thread_finished(env),
+			SSDFS_DEFAULT_TIMEOUT);
+
+	switch (atomic_read(&env->state)) {
+	case SSDFS_RECOVERY_FINISHED:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("stripe %u has SB segment\n",
+			  stripe_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		seg_hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf);
+		cno1 = le64_to_cpu(seg_hdr->cno);
+		seg_hdr = SSDFS_SEG_HDR(env->sbi.vh_buf);
+		cno2 = le64_to_cpu(seg_hdr->cno);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("cno1 %llu, cno2 %llu\n",
+			  cno1, cno2);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (cno1 <= cno2) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("copy sb info: "
+				  "stripe_id %u\n",
+				  stripe_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			ssdfs_copy_sb_info(fsi, env);
+			*has_sb_peb_found = true;
+		}
+		break;
+
+	case SSDFS_RECOVERY_FAILED:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("stripe %u has nothing\n",
+			  stripe_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		break;
+
+	case SSDFS_START_RECOVERY:
+		err = -ERANGE;
+		SSDFS_WARN("thread is working too long: "
+			   "stripe %u\n",
+			   stripe_id);
+		atomic_set(&env->state, SSDFS_RECOVERY_FAILED);
+		break;
+
+	default:
+		BUG();
+	}
+
+	env->found = NULL;
+	return err;
+}
+
+int ssdfs_gather_superblock_info(struct ssdfs_fs_info *fsi, int silent)
+{
+	struct ssdfs_volume_header *vh;
+	struct ssdfs_recovery_env *array = NULL;
+	struct ssdfs_found_protected_pebs *found_pebs = NULL;
+	u64 dev_size;
+	u32 erasesize;
+	u64 pebs_per_volume;
+	u32 fragments_count = 0;
+	u16 pebs_per_fragment = 0;
+	u16 stripes_per_fragment = 0;
+	u16 pebs_per_stripe = 0;
+	u32 stripes_count = 0;
+	u32 threads_count;
+	u32 jobs_count;
+	u32 processed_stripes = 0;
+	u64 processed_pebs = 0;
+	bool has_sb_peb_found1, has_sb_peb_found2;
+	bool has_iteration_succeeded;
+	u16 calculated;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("fsi %p, silent %#x\n", fsi, silent);
+#else
+	SSDFS_DBG("fsi %p, silent %#x\n", fsi, silent);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	err = ssdfs_init_sb_info(fsi, &fsi->sbi);
+	if (likely(!err)) {
+		err = ssdfs_init_sb_info(fsi, &fsi->sbi_backup);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to prepare sb info: err %d\n", err);
+		goto free_buf;
+	}
+
+	err = ssdfs_find_any_valid_volume_header(fsi,
+						 SSDFS_RESERVED_VBR_SIZE,
+						 silent);
+	if (err)
+		goto forget_buf;
+
+	vh = SSDFS_VH(fsi->sbi.vh_buf);
+	fragments_count = le32_to_cpu(vh->maptbl.fragments_count);
+	pebs_per_fragment = le16_to_cpu(vh->maptbl.pebs_per_fragment);
+	pebs_per_stripe = le16_to_cpu(vh->maptbl.pebs_per_stripe);
+	stripes_per_fragment = le16_to_cpu(vh->maptbl.stripes_per_fragment);
+
+	dev_size = fsi->devops->device_size(fsi->sb);
+	erasesize = 1 << vh->log_erasesize;
+	pebs_per_volume = div_u64(dev_size, erasesize);
+
+	stripes_count = fragments_count * stripes_per_fragment;
+	threads_count = min_t(u32, SSDFS_RECOVERY_THREADS, stripes_count);
+
+	has_sb_peb_found1 = false;
+	has_sb_peb_found2 = false;
+
+	found_pebs = ssdfs_recovery_kcalloc(stripes_count,
+				sizeof(struct ssdfs_found_protected_pebs),
+				GFP_KERNEL);
+	if (!found_pebs) {
+		err = -ENOMEM;
+		SSDFS_ERR("fail to allocate the PEBs details array\n");
+		goto free_environment;
+	}
+
+	for (i = 0; i < stripes_count; i++) {
+		ssdfs_init_found_pebs_details(&found_pebs[i]);
+	}
+
+	array = ssdfs_recovery_kcalloc(threads_count,
+				sizeof(struct ssdfs_recovery_env),
+				GFP_KERNEL);
+	if (!array) {
+		err = -ENOMEM;
+		SSDFS_ERR("fail to allocate the environment\n");
+		goto free_environment;
+	}
+
+	for (i = 0; i < threads_count; i++) {
+		err = ssdfs_init_recovery_environment(fsi, vh,
+					pebs_per_volume, &array[i]);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to prepare sb info: err %d\n", err);
+
+			for (; i >= 0; i--) {
+				ssdfs_destruct_sb_info(&array[i].sbi);
+				ssdfs_destruct_sb_info(&array[i].sbi_backup);
+			}
+
+			goto free_environment;
+		}
+	}
+
+	for (i = 0; i < threads_count; i++) {
+		err = ssdfs_recovery_start_thread(&array[i], i);
+		if (unlikely(err)) {
+			if (err == -EINTR) {
+				/*
+				 * Ignore this error.
+				 */
+			} else {
+				SSDFS_ERR("fail to start thread: "
+					  "id %u, err %d\n",
+					  i, err);
+			}
+
+			for (; i >= 0; i--)
+				ssdfs_recovery_stop_thread(&array[i]);
+
+			goto destruct_sb_info;
+		}
+	}
+
+	jobs_count = 1;
+
+	processed_stripes = 0;
+	processed_pebs = 0;
+
+	while (processed_pebs < pebs_per_volume) {
+		/* Fast search phase */
+		has_iteration_succeeded = false;
+
+		if (processed_stripes >= stripes_count) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("processed_stripes %u >= stripes_count %u\n",
+				  processed_stripes, stripes_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto try_slow_search;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("FAST_SEARCH: jobs_count %u\n", jobs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		for (i = 0; i < jobs_count; i++) {
+			calculated =
+				ssdfs_get_pebs_per_stripe(pebs_per_volume,
+							  processed_pebs,
+							  fragments_count,
+							  pebs_per_fragment,
+							  stripes_per_fragment,
+							  pebs_per_stripe);
+
+			if ((processed_pebs + calculated) > pebs_per_volume)
+				calculated = pebs_per_volume - processed_pebs;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("i %d, start_peb %llu, pebs_count %u\n",
+				  i, processed_pebs, calculated);
+			SSDFS_DBG("pebs_per_volume %llu, processed_pebs %llu\n",
+				  pebs_per_volume, processed_pebs);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			err = ssdfs_start_recovery_thread_activity(&array[i],
+					&found_pebs[processed_stripes + i],
+					processed_pebs, calculated,
+					SSDFS_RECOVERY_FAST_SEARCH);
+			if (err) {
+				SSDFS_ERR("fail to start thread's activity: "
+					  "err %d\n", err);
+				goto finish_sb_peb_search;
+			}
+
+			processed_pebs += calculated;
+		}
+
+		for (i = 0; i < jobs_count; i++) {
+			err = ssdfs_wait_recovery_thread_finish(fsi,
+						&array[i],
+						processed_stripes + i,
+						&has_iteration_succeeded);
+			if (unlikely(err)) {
+				has_sb_peb_found1 = false;
+				goto finish_sb_peb_search;
+			}
+
+			switch (array[i].err) {
+			case 0:
+				/* SB PEB has been found */
+				/* continue logic */
+				break;
+
+			case -ENODATA:
+			case -ENOENT:
+			case -EAGAIN:
+			case -E2BIG:
+				/* SB PEB has not been found */
+				/* continue logic */
+				break;
+
+			default:
+				/* Something is going wrong */
+				/* stop execution */
+				err = array[i].err;
+				has_sb_peb_found1 = false;
+				SSDFS_ERR("fail to find valid SB PEB: "
+					  "err %d\n", err);
+				goto finish_sb_peb_search;
+			}
+		}
+
+		if (has_iteration_succeeded) {
+			has_sb_peb_found1 = true;
+			goto finish_sb_peb_search;
+		}
+
+		processed_stripes += jobs_count;
+
+		jobs_count <<= 1;
+		jobs_count = min_t(u32, jobs_count, threads_count);
+		jobs_count = min_t(u32, jobs_count,
+				   stripes_count - processed_stripes);
+	};
+
+try_slow_search:
+	jobs_count = 1;
+
+	processed_stripes = 0;
+	processed_pebs = 0;
+
+	while (processed_pebs < pebs_per_volume) {
+		/* Slow search phase */
+		has_iteration_succeeded = false;
+
+		if (processed_stripes >= stripes_count) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("processed_stripes %u >= stripes_count %u\n",
+				  processed_stripes, stripes_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto finish_sb_peb_search;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("SLOW_SEARCH: jobs_count %u\n", jobs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		for (i = 0; i < jobs_count; i++) {
+			calculated =
+				ssdfs_get_pebs_per_stripe(pebs_per_volume,
+							  processed_pebs,
+							  fragments_count,
+							  pebs_per_fragment,
+							  stripes_per_fragment,
+							  pebs_per_stripe);
+
+			if ((processed_pebs + calculated) > pebs_per_volume)
+				calculated = pebs_per_volume - processed_pebs;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("i %d, start_peb %llu, pebs_count %u\n",
+				  i, processed_pebs, calculated);
+			SSDFS_DBG("pebs_per_volume %llu, processed_pebs %llu\n",
+				  pebs_per_volume, processed_pebs);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			err = ssdfs_start_recovery_thread_activity(&array[i],
+					&found_pebs[processed_stripes + i],
+					processed_pebs, calculated,
+					SSDFS_RECOVERY_SLOW_SEARCH);
+			if (err == -ENODATA) {
+				/* thread continues to sleep */
+				/* continue logic */
+			} else if (err) {
+				SSDFS_ERR("fail to start thread's activity: "
+					  "err %d\n", err);
+				goto finish_sb_peb_search;
+			}
+
+			processed_pebs += calculated;
+		}
+
+		for (i = 0; i < jobs_count; i++) {
+			err = ssdfs_wait_recovery_thread_finish(fsi,
+						&array[i],
+						processed_stripes + i,
+						&has_iteration_succeeded);
+			if (unlikely(err)) {
+				has_sb_peb_found2 = false;
+				goto finish_sb_peb_search;
+			}
+
+			switch (array[i].err) {
+			case 0:
+				/* SB PEB has been found */
+				/* continue logic */
+				break;
+
+			case -ENODATA:
+			case -ENOENT:
+			case -EAGAIN:
+			case -E2BIG:
+				/* SB PEB has not been found */
+				/* continue logic */
+				break;
+
+			default:
+				/* Something is going wrong */
+				/* stop execution */
+				err = array[i].err;
+				has_sb_peb_found2 = false;
+				SSDFS_ERR("fail to find valid SB PEB: "
+					  "err %d\n", err);
+				goto finish_sb_peb_search;
+			}
+		}
+
+		if (has_iteration_succeeded) {
+			has_sb_peb_found2 = true;
+			goto finish_sb_peb_search;
+		}
+
+		processed_stripes += jobs_count;
+
+		jobs_count <<= 1;
+		jobs_count = min_t(u32, jobs_count, threads_count);
+		jobs_count = min_t(u32, jobs_count,
+				   stripes_count - processed_stripes);
+	};
+
+finish_sb_peb_search:
+	for (i = 0; i < threads_count; i++)
+		ssdfs_recovery_stop_thread(&array[i]);
+
+destruct_sb_info:
+	for (i = 0; i < threads_count; i++) {
+		ssdfs_destruct_sb_info(&array[i].sbi);
+		ssdfs_destruct_sb_info(&array[i].sbi_backup);
+	}
+
+free_environment:
+	if (found_pebs) {
+		ssdfs_recovery_kfree(found_pebs);
+		found_pebs = NULL;
+	}
+
+	if (array) {
+		ssdfs_recovery_kfree(array);
+		array = NULL;
+	}
+
+	switch (err) {
+	case 0:
+		/* SB PEB has been found */
+		/* continue logic */
+		break;
+
+	case -ENODATA:
+	case -ENOENT:
+	case -EAGAIN:
+	case -E2BIG:
+		/* SB PEB has not been found */
+		/* continue logic */
+		break;
+
+	default:
+		/* Something is going wrong */
+		/* stop execution */
+		SSDFS_ERR("fail to find valid SB PEB: err %d\n", err);
+		goto forget_buf;
+	}
+
+	if (has_sb_peb_found1)
+		SSDFS_DBG("FAST_SEARCH: found SB seg\n");
+	else if (has_sb_peb_found2)
+		SSDFS_DBG("SLOW_SEARCH: found SB seg\n");
+
+	if (!has_sb_peb_found1 && !has_sb_peb_found2) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_ERR("unable to find latest valid sb segment: "
+			  "trying old algorithm!!!\n");
+		BUG();
+#else
+		SSDFS_ERR("unable to find latest valid sb segment: "
+			  "trying old algorithm!!!\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_find_any_valid_sb_segment(fsi, 0);
+		if (err)
+			goto forget_buf;
+
+		err = ssdfs_find_latest_valid_sb_segment(fsi);
+		if (err)
+			goto forget_buf;
+	}
+
+	err = ssdfs_find_latest_valid_sb_info2(fsi);
+	if (err) {
+		SSDFS_ERR("unable to find latest valid sb info: "
+			  "trying old algorithm!!!\n");
+
+		err = ssdfs_find_latest_valid_sb_info(fsi);
+		if (err)
+			goto forget_buf;
+	}
+
+	err = ssdfs_initialize_fs_info(fsi);
+	if (err && err != -EROFS)
+		goto forget_buf;
+
+	err = ssdfs_read_maptbl_cache(fsi);
+	if (err)
+		goto forget_buf;
+
+	if (is_ssdfs_snapshot_rules_exist(fsi)) {
+		err = ssdfs_read_snapshot_rules(fsi);
+		if (err)
+			goto forget_buf;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("DONE: gather superblock info\n");
+#else
+	SSDFS_DBG("DONE: gather superblock info\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	return 0;
+
+forget_buf:
+	fsi->vh = NULL;
+	fsi->vs = NULL;
+
+free_buf:
+	ssdfs_destruct_sb_info(&fsi->sbi);
+	ssdfs_destruct_sb_info(&fsi->sbi_backup);
+	return err;
+}
diff --git a/fs/ssdfs/recovery.h b/fs/ssdfs/recovery.h
new file mode 100644
index 000000000000..aead1ebe29e6
--- /dev/null
+++ b/fs/ssdfs/recovery.h
@@ -0,0 +1,446 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/recovery.h - recovery logic declarations.
+ *
+ * Copyright (c) 2019-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_RECOVERY_H
+#define _SSDFS_RECOVERY_H
+
+#define SSDFS_RESERVED_SB_SEGS		(6)
+#define SSDFS_RECOVERY_THREADS		(12)
+
+/*
+ * struct ssdfs_found_peb - found PEB details
+ * @peb_id: PEB's ID
+ * @cno: PEB's starting checkpoint
+ * @is_superblock_peb: has superblock PEB been found?
+ * @state: PEB's state
+ */
+struct ssdfs_found_peb {
+	u64 peb_id;
+	u64 cno;
+	bool is_superblock_peb;
+	int state;
+};
+
+/*
+ * States of found PEB
+ */
+enum {
+	SSDFS_PEB_NOT_CHECKED,
+	SSDFS_FOUND_PEB_VALID,
+	SSDFS_FOUND_PEB_INVALID,
+	SSDFS_FOUND_PEB_STATE_MAX
+};
+
+/*
+ * struct ssdfs_superblock_pebs_pair - pair of superblock PEBs
+ * @pair: main and copy superblock PEBs
+ */
+struct ssdfs_superblock_pebs_pair {
+	struct ssdfs_found_peb pair[SSDFS_SB_SEG_COPY_MAX];
+};
+
+/*
+ * struct ssdfs_found_superblock_pebs - found superblock PEBs
+ * sb_pebs: array of superblock PEBs details
+ */
+struct ssdfs_found_superblock_pebs {
+	struct ssdfs_superblock_pebs_pair sb_pebs[SSDFS_SB_CHAIN_MAX];
+};
+
+/*
+ * struct ssdfs_found_protected_peb - protected PEB details
+ * @peb: protected PEB details
+ * @found: superblock PEBs details
+ */
+struct ssdfs_found_protected_peb {
+	struct ssdfs_found_peb peb;
+	struct ssdfs_found_superblock_pebs found;
+};
+
+/*
+ * struct ssdfs_found_protected_pebs - found protected PEBs
+ * @start_peb: starting PEB ID in fragment
+ * @pebs_count: PEBs count in fragment
+ * @lower_offset: lower offset bound
+ * @middle_offset: middle offset
+ * @upper_offset: upper offset bound
+ * @current_offset: current position of the search
+ * @search_phase: current search phase
+ * array: array of protected PEBs details
+ */
+struct ssdfs_found_protected_pebs {
+	u64 start_peb;
+	u32 pebs_count;
+
+	u64 lower_offset;
+	u64 middle_offset;
+	u64 upper_offset;
+	u64 current_offset;
+	int search_phase;
+
+#define SSDFS_LOWER_PEB_INDEX			(0)
+#define SSDFS_UPPER_PEB_INDEX			(1)
+#define SSDFS_LAST_CNO_PEB_INDEX		(2)
+#define SSDFS_PROTECTED_PEB_CHAIN_MAX		(3)
+	struct ssdfs_found_protected_peb array[SSDFS_PROTECTED_PEB_CHAIN_MAX];
+};
+
+/*
+ * struct ssdfs_recovery_env - recovery environment
+ * @found: found PEBs' details
+ * @err: result of the search
+ * @state: recovery thread's state
+ * @pebs_per_volume: PEBs number per volume
+ * @last_vh: buffer for last valid volume header
+ * @sbi: superblock info
+ * @sbi_backup: backup copy of superblock info
+ * @request_wait_queue: request wait queue of recovery thread
+ * @result_wait_queue: result wait queue of recovery thread
+ * @thread: descriptor of recovery thread
+ * @fsi: file system info object
+ */
+struct ssdfs_recovery_env {
+	struct ssdfs_found_protected_pebs *found;
+
+	int err;
+	atomic_t state;
+	u64 pebs_per_volume;
+
+	struct ssdfs_volume_header last_vh;
+	struct ssdfs_sb_info sbi;
+	struct ssdfs_sb_info sbi_backup;
+
+	wait_queue_head_t request_wait_queue;
+	wait_queue_head_t result_wait_queue;
+	struct ssdfs_thread_info thread;
+	struct ssdfs_fs_info *fsi;
+};
+
+/*
+ * Search phases
+ */
+enum {
+	SSDFS_RECOVERY_NO_SEARCH,
+	SSDFS_RECOVERY_FAST_SEARCH,
+	SSDFS_RECOVERY_SLOW_SEARCH,
+	SSDFS_RECOVERY_FIRST_SLOW_TRY,
+	SSDFS_RECOVERY_SECOND_SLOW_TRY,
+	SSDFS_RECOVERY_THIRD_SLOW_TRY,
+	SSDFS_RECOVERY_SEARCH_PHASES_MAX
+};
+
+/*
+ * Recovery thread's state
+ */
+enum {
+	SSDFS_RECOVERY_UNKNOWN_STATE,
+	SSDFS_START_RECOVERY,
+	SSDFS_RECOVERY_FAILED,
+	SSDFS_RECOVERY_FINISHED,
+	SSDFS_RECOVERY_STATE_MAX
+};
+
+/*
+ * Operation types
+ */
+enum {
+	SSDFS_USE_PEB_ISBAD_OP,
+	SSDFS_USE_READ_OP,
+};
+
+/*
+ * Inline functions
+ */
+
+static inline
+struct ssdfs_found_peb *
+CUR_MAIN_SB_PEB(struct ssdfs_found_superblock_pebs *ptr)
+{
+	return &ptr->sb_pebs[SSDFS_CUR_SB_SEG].pair[SSDFS_MAIN_SB_SEG];
+}
+
+static inline
+struct ssdfs_found_peb *
+CUR_COPY_SB_PEB(struct ssdfs_found_superblock_pebs *ptr)
+{
+	return &ptr->sb_pebs[SSDFS_CUR_SB_SEG].pair[SSDFS_COPY_SB_SEG];
+}
+
+static inline
+struct ssdfs_found_peb *
+NEXT_MAIN_SB_PEB(struct ssdfs_found_superblock_pebs *ptr)
+{
+	return &ptr->sb_pebs[SSDFS_NEXT_SB_SEG].pair[SSDFS_MAIN_SB_SEG];
+}
+
+static inline
+struct ssdfs_found_peb *
+NEXT_COPY_SB_PEB(struct ssdfs_found_superblock_pebs *ptr)
+{
+	return &ptr->sb_pebs[SSDFS_NEXT_SB_SEG].pair[SSDFS_COPY_SB_SEG];
+}
+
+static inline
+struct ssdfs_found_peb *
+RESERVED_MAIN_SB_PEB(struct ssdfs_found_superblock_pebs *ptr)
+{
+	return &ptr->sb_pebs[SSDFS_RESERVED_SB_SEG].pair[SSDFS_MAIN_SB_SEG];
+}
+
+static inline
+struct ssdfs_found_peb *
+RESERVED_COPY_SB_PEB(struct ssdfs_found_superblock_pebs *ptr)
+{
+	return &ptr->sb_pebs[SSDFS_RESERVED_SB_SEG].pair[SSDFS_COPY_SB_SEG];
+}
+
+static inline
+struct ssdfs_found_peb *
+PREV_MAIN_SB_PEB(struct ssdfs_found_superblock_pebs *ptr)
+{
+	return &ptr->sb_pebs[SSDFS_PREV_SB_SEG].pair[SSDFS_MAIN_SB_SEG];
+}
+
+static inline
+struct ssdfs_found_peb *
+PREV_COPY_SB_PEB(struct ssdfs_found_superblock_pebs *ptr)
+{
+	return &ptr->sb_pebs[SSDFS_PREV_SB_SEG].pair[SSDFS_COPY_SB_SEG];
+}
+
+static inline
+bool IS_INSIDE_STRIPE(struct ssdfs_found_protected_pebs *ptr,
+		      struct ssdfs_found_peb *found)
+{
+	return found->peb_id >= ptr->start_peb &&
+		found->peb_id < (ptr->start_peb + ptr->pebs_count);
+}
+
+static inline
+u64 SSDFS_RECOVERY_LOW_OFF(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi || !env->found);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (env->found->search_phase) {
+	case SSDFS_RECOVERY_FAST_SEARCH:
+		return env->found->lower_offset;
+
+	case SSDFS_RECOVERY_SLOW_SEARCH:
+	case SSDFS_RECOVERY_FIRST_SLOW_TRY:
+		return env->found->middle_offset;
+
+	case SSDFS_RECOVERY_SECOND_SLOW_TRY:
+		return env->found->lower_offset;
+
+	case SSDFS_RECOVERY_THIRD_SLOW_TRY:
+		if (env->found->start_peb == 0)
+			return SSDFS_RESERVED_VBR_SIZE;
+		else
+			return env->found->start_peb * env->fsi->erasesize;
+	}
+
+	return U64_MAX;
+}
+
+static inline
+u64 SSDFS_RECOVERY_UPPER_OFF(struct ssdfs_recovery_env *env)
+{
+	u64 calculated_peb;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi || !env->found);
+	BUG_ON(env->pebs_per_volume == 0);
+	BUG_ON(env->pebs_per_volume >= U64_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (env->found->search_phase) {
+	case SSDFS_RECOVERY_FAST_SEARCH:
+		calculated_peb = div_u64(env->found->middle_offset,
+					 env->fsi->erasesize);
+		calculated_peb += SSDFS_MAPTBL_PROTECTION_STEP - 1;
+		if (calculated_peb >= env->pebs_per_volume)
+			calculated_peb = env->pebs_per_volume - 1;
+
+		return calculated_peb * env->fsi->erasesize;
+
+	case SSDFS_RECOVERY_SLOW_SEARCH:
+	case SSDFS_RECOVERY_FIRST_SLOW_TRY:
+		return env->found->upper_offset;
+
+	case SSDFS_RECOVERY_SECOND_SLOW_TRY:
+		return env->found->middle_offset;
+
+	case SSDFS_RECOVERY_THIRD_SLOW_TRY:
+		return env->found->lower_offset;
+	}
+
+	return U64_MAX;
+}
+
+static inline
+u64 *SSDFS_RECOVERY_CUR_OFF_PTR(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return &env->found->current_offset;
+}
+
+static inline
+void SSDFS_RECOVERY_SET_FAST_SEARCH_TRY(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*SSDFS_RECOVERY_CUR_OFF_PTR(env) = env->found->lower_offset;
+	env->found->search_phase = SSDFS_RECOVERY_FAST_SEARCH;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("lower_offset %llu, "
+		  "middle_offset %llu, "
+		  "upper_offset %llu, "
+		  "current_offset %llu, "
+		  "search_phase %#x\n",
+		  env->found->lower_offset,
+		  env->found->middle_offset,
+		  env->found->upper_offset,
+		  env->found->current_offset,
+		  env->found->search_phase);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static inline
+void SSDFS_RECOVERY_SET_FIRST_SLOW_TRY(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*SSDFS_RECOVERY_CUR_OFF_PTR(env) = env->found->middle_offset;
+	env->found->search_phase = SSDFS_RECOVERY_FIRST_SLOW_TRY;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("lower_offset %llu, "
+		  "middle_offset %llu, "
+		  "upper_offset %llu, "
+		  "current_offset %llu, "
+		  "search_phase %#x\n",
+		  env->found->lower_offset,
+		  env->found->middle_offset,
+		  env->found->upper_offset,
+		  env->found->current_offset,
+		  env->found->search_phase);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static inline
+bool is_second_slow_try_possible(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return env->found->lower_offset < env->found->middle_offset;
+}
+
+static inline
+void SSDFS_RECOVERY_SET_SECOND_SLOW_TRY(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*SSDFS_RECOVERY_CUR_OFF_PTR(env) = env->found->lower_offset;
+	env->found->search_phase = SSDFS_RECOVERY_SECOND_SLOW_TRY;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("lower_offset %llu, "
+		  "middle_offset %llu, "
+		  "upper_offset %llu, "
+		  "current_offset %llu, "
+		  "search_phase %#x\n",
+		  env->found->lower_offset,
+		  env->found->middle_offset,
+		  env->found->upper_offset,
+		  env->found->current_offset,
+		  env->found->search_phase);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static inline
+bool is_third_slow_try_possible(struct ssdfs_recovery_env *env)
+{
+	u64 offset;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi || !env->found);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	offset = env->found->start_peb * env->fsi->erasesize;
+	return offset < env->found->lower_offset;
+}
+
+static inline
+void SSDFS_RECOVERY_SET_THIRD_SLOW_TRY(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi || !env->found);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*SSDFS_RECOVERY_CUR_OFF_PTR(env) = env->found->lower_offset;
+	env->found->search_phase = SSDFS_RECOVERY_THIRD_SLOW_TRY;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("lower_offset %llu, "
+		  "middle_offset %llu, "
+		  "upper_offset %llu, "
+		  "current_offset %llu, "
+		  "search_phase %#x\n",
+		  env->found->lower_offset,
+		  env->found->middle_offset,
+		  env->found->upper_offset,
+		  env->found->current_offset,
+		  env->found->search_phase);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+/*
+ * Recovery API
+ */
+int ssdfs_recovery_start_thread(struct ssdfs_recovery_env *env,
+				u32 id);
+int ssdfs_recovery_stop_thread(struct ssdfs_recovery_env *env);
+void ssdfs_backup_sb_info2(struct ssdfs_recovery_env *env);
+void ssdfs_restore_sb_info2(struct ssdfs_recovery_env *env);
+int ssdfs_read_checked_sb_info3(struct ssdfs_recovery_env *env,
+				u64 peb_id, u32 pages_off);
+int __ssdfs_find_any_valid_volume_header2(struct ssdfs_recovery_env *env,
+					  u64 start_offset,
+					  u64 end_offset,
+					  u64 step);
+int ssdfs_find_any_valid_sb_segment2(struct ssdfs_recovery_env *env,
+				     u64 threshold_peb);
+bool is_cur_main_sb_peb_exhausted(struct ssdfs_recovery_env *env);
+bool is_cur_copy_sb_peb_exhausted(struct ssdfs_recovery_env *env);
+int ssdfs_check_next_sb_pebs_pair(struct ssdfs_recovery_env *env);
+int ssdfs_check_reserved_sb_pebs_pair(struct ssdfs_recovery_env *env);
+int ssdfs_find_latest_valid_sb_segment2(struct ssdfs_recovery_env *env);
+int ssdfs_find_last_sb_seg_outside_fragment(struct ssdfs_recovery_env *env);
+int ssdfs_recovery_try_fast_search(struct ssdfs_recovery_env *env);
+int ssdfs_recovery_try_slow_search(struct ssdfs_recovery_env *env);
+
+#endif /* _SSDFS_RECOVERY_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH 08/76] ssdfs: search last actual superblock
  2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
                   ` (6 preceding siblings ...)
  2023-02-25  1:08 ` [RFC PATCH 07/76] ssdfs: basic mount logic implementation Viacheslav Dubeyko
@ 2023-02-25  1:08 ` Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 09/76] ssdfs: internal array/sequence primitives Viacheslav Dubeyko
                   ` (68 subsequent siblings)
  76 siblings, 0 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

SSDFS is pure LFS file system. It means that there is no fixed
position of superblock on the volume. SSDFS keeps superblock
information into every segment header and log footer. Actually,
every log contains copy of superblock. However, it needs to find
a specialized superblock segment and last actual superblock state
for proper intialization of file system instance.

Search logic is split on several steps:
(1) find any valid segment header and extract information about
    current, next, and reserved superblock segment location,
(2) find latest valid superblock segment,
(3) find latest valid superblock state into superblock segment.

Search logic splits file system volume on several portions. It starts
to search in the first portion by using fast search algorithm.
The fast algorithm checks every 50th erase block in the portion.
If first portion hasn't last superblock segment, then search logic
starts several threads that are looking for last actual and valid
superblock segment by using fast search logic. Finally, if the fast
search algorithm is unable to find the last actual superblock segment,
then file system driver repeat the search by means of using slow
search algorithm logic. The slow search algorithm simply checks every
erase block in the portion. Usually, fast search algorithm is enough,
but if the volume could be corrupted, then slow search logic can be
used to find consistent state of superblock and to try to recover
the volume state.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
CC: Viacheslav Dubeyko <viacheslav.dubeyko@bytedance.com>
CC: Luka Perkov <luka.perkov@sartura.hr>
CC: Bruno Banelli <bruno.banelli@sartura.hr>
---
 fs/ssdfs/recovery_fast_search.c | 1194 ++++++++++++++++++++++++++++++
 fs/ssdfs/recovery_slow_search.c |  585 +++++++++++++++
 fs/ssdfs/recovery_thread.c      | 1196 +++++++++++++++++++++++++++++++
 3 files changed, 2975 insertions(+)
 create mode 100644 fs/ssdfs/recovery_fast_search.c
 create mode 100644 fs/ssdfs/recovery_slow_search.c
 create mode 100644 fs/ssdfs/recovery_thread.c

diff --git a/fs/ssdfs/recovery_fast_search.c b/fs/ssdfs/recovery_fast_search.c
new file mode 100644
index 000000000000..70c97331fccb
--- /dev/null
+++ b/fs/ssdfs/recovery_fast_search.c
@@ -0,0 +1,1194 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/recovery_fast_search.c - fast superblock search.
+ *
+ * Copyright (c) 2020-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/kthread.h>
+#include <linux/pagevec.h>
+#include <linux/blkdev.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+#include "page_array.h"
+#include "page_vector.h"
+#include "peb.h"
+#include "segment_bitmap.h"
+#include "peb_mapping_table.h"
+#include "recovery.h"
+
+#include <trace/events/ssdfs.h>
+
+static inline
+bool IS_SB_PEB(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+#endif /* CONFIG_SSDFS_DEBUG */
+	int type;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env);
+	BUG_ON(!env->sbi.vh_buf);
+	BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic));
+	BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	type = le16_to_cpu(SSDFS_SEG_HDR(env->sbi.vh_buf)->seg_type);
+
+	if (type == SSDFS_SB_SEG_TYPE)
+		return true;
+
+	return false;
+}
+
+static inline
+void STORE_PEB_INFO(struct ssdfs_found_peb *peb,
+		    u64 peb_id, u64 cno,
+		    int type, int state)
+{
+	peb->peb_id = peb_id;
+	peb->cno = cno;
+	if (type == SSDFS_SB_SEG_TYPE)
+		peb->is_superblock_peb = true;
+	else
+		peb->is_superblock_peb = false;
+	peb->state = state;
+}
+
+static inline
+void STORE_SB_PEB_INFO(struct ssdfs_found_peb *peb,
+		       u64 peb_id)
+{
+	STORE_PEB_INFO(peb, peb_id, U64_MAX,
+			SSDFS_UNKNOWN_SEG_TYPE,
+			SSDFS_PEB_NOT_CHECKED);
+}
+
+static inline
+void STORE_MAIN_SB_PEB_INFO(struct ssdfs_recovery_env *env,
+			    struct ssdfs_found_protected_peb *ptr,
+			    int sb_seg_index)
+{
+	struct ssdfs_superblock_pebs_pair *pair;
+	struct ssdfs_found_peb *sb_peb;
+	u64 peb_id;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!ptr);
+	BUG_ON(sb_seg_index < SSDFS_CUR_SB_SEG ||
+		sb_seg_index >= SSDFS_SB_CHAIN_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	pair = &ptr->found.sb_pebs[sb_seg_index];
+	sb_peb = &pair->pair[SSDFS_MAIN_SB_SEG];
+	peb_id = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf), sb_seg_index);
+
+	STORE_SB_PEB_INFO(sb_peb, peb_id);
+}
+
+static inline
+void STORE_COPY_SB_PEB_INFO(struct ssdfs_recovery_env *env,
+			    struct ssdfs_found_protected_peb *ptr,
+			    int sb_seg_index)
+{
+	struct ssdfs_superblock_pebs_pair *pair;
+	struct ssdfs_found_peb *sb_peb;
+	u64 peb_id;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!ptr);
+	BUG_ON(sb_seg_index < SSDFS_CUR_SB_SEG ||
+		sb_seg_index >= SSDFS_SB_CHAIN_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	pair = &ptr->found.sb_pebs[sb_seg_index];
+	sb_peb = &pair->pair[SSDFS_COPY_SB_SEG];
+	peb_id = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi.vh_buf), sb_seg_index);
+
+	STORE_SB_PEB_INFO(sb_peb, peb_id);
+}
+
+static inline
+void ssdfs_store_superblock_pebs_info(struct ssdfs_recovery_env *env,
+				      int peb_index)
+{
+	struct ssdfs_found_protected_peb *ptr;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found);
+	BUG_ON(!env->sbi.vh_buf);
+	BUG_ON(peb_index < SSDFS_LOWER_PEB_INDEX ||
+		peb_index >= SSDFS_PROTECTED_PEB_CHAIN_MAX);
+
+	SSDFS_DBG("env %p, peb_index %d\n",
+		  env, peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ptr = &env->found->array[peb_index];
+
+	STORE_MAIN_SB_PEB_INFO(env, ptr, SSDFS_CUR_SB_SEG);
+	STORE_COPY_SB_PEB_INFO(env, ptr, SSDFS_CUR_SB_SEG);
+
+	STORE_MAIN_SB_PEB_INFO(env, ptr, SSDFS_NEXT_SB_SEG);
+	STORE_COPY_SB_PEB_INFO(env, ptr, SSDFS_NEXT_SB_SEG);
+
+	STORE_MAIN_SB_PEB_INFO(env, ptr, SSDFS_RESERVED_SB_SEG);
+	STORE_COPY_SB_PEB_INFO(env, ptr, SSDFS_RESERVED_SB_SEG);
+
+	STORE_MAIN_SB_PEB_INFO(env, ptr, SSDFS_PREV_SB_SEG);
+	STORE_COPY_SB_PEB_INFO(env, ptr, SSDFS_PREV_SB_SEG);
+}
+
+static inline
+void ssdfs_store_protected_peb_info(struct ssdfs_recovery_env *env,
+				    int peb_index,
+				    u64 peb_id)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+#endif /* CONFIG_SSDFS_DEBUG */
+	struct ssdfs_found_protected_peb *ptr;
+	u64 cno;
+	int type;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found);
+	BUG_ON(!env->sbi.vh_buf);
+	BUG_ON(peb_index < SSDFS_LOWER_PEB_INDEX ||
+		peb_index >= SSDFS_PROTECTED_PEB_CHAIN_MAX);
+	BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic));
+	BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size));
+
+	SSDFS_DBG("env %p, peb_index %d, peb_id %llu\n",
+		  env, peb_index, peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	cno = le64_to_cpu(SSDFS_SEG_HDR(env->sbi.vh_buf)->cno);
+	type = le16_to_cpu(SSDFS_SEG_HDR(env->sbi.vh_buf)->seg_type);
+
+	ptr = &env->found->array[peb_index];
+	STORE_PEB_INFO(&ptr->peb, peb_id, cno, type, SSDFS_FOUND_PEB_VALID);
+	ssdfs_store_superblock_pebs_info(env, peb_index);
+}
+
+static
+int ssdfs_calculate_recovery_search_bounds(struct ssdfs_recovery_env *env,
+					   u64 dev_size,
+					   u64 *lower_peb, loff_t *lower_off,
+					   u64 *upper_peb, loff_t *upper_off)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found || !env->fsi);
+	BUG_ON(!lower_peb || !lower_off);
+	BUG_ON(!upper_peb || !upper_off);
+
+	SSDFS_DBG("env %p, start_peb %llu, "
+		  "pebs_count %u, dev_size %llu\n",
+		  env, env->found->start_peb,
+		  env->found->pebs_count, dev_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*lower_peb = env->found->start_peb;
+	if (*lower_peb == 0)
+		*lower_off = SSDFS_RESERVED_VBR_SIZE;
+	else
+		*lower_off = *lower_peb * env->fsi->erasesize;
+
+	if (*lower_off >= dev_size) {
+		SSDFS_ERR("invalid offset: lower_off %llu, "
+			  "dev_size %llu\n",
+			  (unsigned long long)*lower_off,
+			  dev_size);
+		return -ERANGE;
+	}
+
+	*upper_peb = env->found->pebs_count - 1;
+	*upper_peb /= SSDFS_MAPTBL_PROTECTION_STEP;
+	*upper_peb *= SSDFS_MAPTBL_PROTECTION_STEP;
+	*upper_peb += env->found->start_peb;
+	*upper_off = *upper_peb * env->fsi->erasesize;
+
+	if (*upper_off >= dev_size) {
+		*upper_off = min_t(u64, *upper_off,
+				   dev_size - env->fsi->erasesize);
+		*upper_peb = *upper_off / env->fsi->erasesize;
+		*upper_peb -= env->found->start_peb;
+		*upper_peb /= SSDFS_MAPTBL_PROTECTION_STEP;
+		*upper_peb *= SSDFS_MAPTBL_PROTECTION_STEP;
+		*upper_peb += env->found->start_peb;
+		*upper_off = *upper_peb * env->fsi->erasesize;
+	}
+
+	return 0;
+}
+
+static
+int ssdfs_find_valid_protected_pebs(struct ssdfs_recovery_env *env)
+{
+	struct super_block *sb = env->fsi->sb;
+	u64 dev_size = env->fsi->devops->device_size(sb);
+	u64 lower_peb, upper_peb;
+	loff_t lower_off, upper_off;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	size_t vh_size = sizeof(struct ssdfs_volume_header);
+	struct ssdfs_volume_header *vh;
+	struct ssdfs_found_protected_peb *found;
+	bool magic_valid = false;
+	u64 cno = U64_MAX, last_cno = U64_MAX;
+	int err;
+
+	if (!env->found) {
+		SSDFS_ERR("unable to find protected PEBs\n");
+		return -EOPNOTSUPP;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("env %p, start_peb %llu, pebs_count %u\n",
+		  env, env->found->start_peb,
+		  env->found->pebs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!env->fsi->devops->read) {
+		SSDFS_ERR("unable to read from device\n");
+		return -EOPNOTSUPP;
+	}
+
+	env->found->lower_offset = dev_size;
+	env->found->middle_offset = dev_size;
+	env->found->upper_offset = dev_size;
+
+	err = ssdfs_calculate_recovery_search_bounds(env, dev_size,
+						     &lower_peb, &lower_off,
+						     &upper_peb, &upper_off);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to calculate search bounds: "
+			  "err %d\n", err);
+		return err;
+	}
+
+	env->found->lower_offset = lower_off;
+	env->found->middle_offset = lower_off;
+	env->found->upper_offset = upper_off;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("lower_peb %llu, upper_peb %llu\n",
+		  lower_peb, upper_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	while (lower_peb <= upper_peb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("lower_peb %llu, lower_off %llu\n",
+			  lower_peb, (u64)lower_off);
+		SSDFS_DBG("upper_peb %llu, upper_off %llu\n",
+			  upper_peb, (u64)upper_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = env->fsi->devops->read(sb,
+					     lower_off,
+					     hdr_size,
+					     env->sbi.vh_buf);
+		vh = SSDFS_VH(env->sbi.vh_buf);
+		magic_valid = is_ssdfs_magic_valid(&vh->magic);
+		cno = le64_to_cpu(SSDFS_SEG_HDR(env->sbi.vh_buf)->cno);
+
+		if (!err && magic_valid) {
+			found = &env->found->array[SSDFS_LOWER_PEB_INDEX];
+
+			if (found->peb.peb_id >= U64_MAX) {
+				ssdfs_store_protected_peb_info(env,
+						SSDFS_LOWER_PEB_INDEX,
+						lower_peb);
+
+				env->found->lower_offset = lower_off;
+
+				ssdfs_memcpy(&env->last_vh, 0, vh_size,
+					     env->sbi.vh_buf, 0, vh_size,
+					     vh_size);
+				ssdfs_backup_sb_info2(env);
+
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("FOUND: lower_peb %llu, "
+					  "lower_bound %llu\n",
+					  lower_peb, lower_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+				goto define_last_cno_peb;
+			}
+
+			ssdfs_store_protected_peb_info(env,
+						SSDFS_UPPER_PEB_INDEX,
+						lower_peb);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("FOUND: lower_peb %llu, "
+				  "lower_bound %llu\n",
+				  lower_peb, lower_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+define_last_cno_peb:
+			if (last_cno >= U64_MAX) {
+				env->found->middle_offset = lower_off;
+				ssdfs_store_protected_peb_info(env,
+						SSDFS_LAST_CNO_PEB_INDEX,
+						lower_peb);
+				ssdfs_memcpy(&env->last_vh, 0, vh_size,
+					     env->sbi.vh_buf, 0, vh_size,
+					     vh_size);
+				ssdfs_backup_sb_info2(env);
+				last_cno = cno;
+
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("FOUND: lower_peb %llu, "
+					  "middle_offset %llu, "
+					  "cno %llu\n",
+					  lower_peb, lower_off, cno);
+#endif /* CONFIG_SSDFS_DEBUG */
+			} else if (cno > last_cno) {
+				env->found->middle_offset = lower_off;
+				ssdfs_store_protected_peb_info(env,
+						SSDFS_LAST_CNO_PEB_INDEX,
+						lower_peb);
+				ssdfs_memcpy(&env->last_vh, 0, vh_size,
+					     env->sbi.vh_buf, 0, vh_size,
+					     vh_size);
+				ssdfs_backup_sb_info2(env);
+				last_cno = cno;
+
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("FOUND: lower_peb %llu, "
+					  "middle_offset %llu, "
+					  "cno %llu\n",
+					  lower_peb, lower_off, cno);
+#endif /* CONFIG_SSDFS_DEBUG */
+			} else {
+				ssdfs_restore_sb_info2(env);
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("ignore valid PEB: "
+					  "lower_peb %llu, lower_off %llu, "
+					  "cno %llu, last_cno %llu\n",
+					  lower_peb, lower_off,
+					  cno, last_cno);
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+		} else {
+			ssdfs_restore_sb_info2(env);
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("peb %llu (offset %llu) is corrupted\n",
+				  lower_peb,
+				  (unsigned long long)lower_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		lower_peb += SSDFS_MAPTBL_PROTECTION_STEP;
+		lower_off = lower_peb * env->fsi->erasesize;
+
+		if (kthread_should_stop())
+			goto finish_search;
+	}
+
+	found = &env->found->array[SSDFS_UPPER_PEB_INDEX];
+
+	if (found->peb.peb_id >= U64_MAX)
+		goto finish_search;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("env->lower_offset %llu, "
+		  "env->middle_offset %llu, "
+		  "env->upper_offset %llu\n",
+		  env->found->lower_offset,
+		  env->found->middle_offset,
+		  env->found->upper_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	SSDFS_RECOVERY_SET_FAST_SEARCH_TRY(env);
+
+	return 0;
+
+finish_search:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("unable to find valid PEB\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	SSDFS_RECOVERY_SET_FAST_SEARCH_TRY(env);
+
+	return -ENODATA;
+}
+
+static inline
+int ssdfs_read_sb_peb_checked(struct ssdfs_recovery_env *env,
+			      u64 peb_id)
+{
+	struct ssdfs_volume_header *vh;
+	size_t vh_size = sizeof(struct ssdfs_volume_header);
+	bool magic_valid = false;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi || !env->fsi->sb);
+	BUG_ON(!env->sbi.vh_buf);
+
+	SSDFS_DBG("peb_id %llu\n", peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_read_checked_sb_info3(env, peb_id, 0);
+	vh = SSDFS_VH(env->sbi.vh_buf);
+	magic_valid = is_ssdfs_magic_valid(&vh->magic);
+
+	if (err || !magic_valid) {
+		err = -ENODATA;
+		ssdfs_restore_sb_info2(env);
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("peb %llu is corrupted\n",
+			  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto finish_check;
+	} else {
+		ssdfs_memcpy(&env->last_vh, 0, vh_size,
+			     env->sbi.vh_buf, 0, vh_size,
+			     vh_size);
+		ssdfs_backup_sb_info2(env);
+		goto finish_check;
+	}
+
+finish_check:
+	return err;
+}
+
+int ssdfs_find_last_sb_seg_outside_fragment(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+#endif /* CONFIG_SSDFS_DEBUG */
+	struct super_block *sb;
+	struct ssdfs_volume_header *vh;
+	u64 leb_id;
+	u64 peb_id;
+	bool magic_valid = false;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi || !env->fsi->sb);
+	BUG_ON(!env->sbi.vh_buf);
+	BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic));
+	BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size));
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	sb = env->fsi->sb;
+	err = -ENODATA;
+
+	leb_id = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	peb_id = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_CUR_SB_SEG);
+
+	do {
+		err = ssdfs_read_sb_peb_checked(env, peb_id);
+		vh = SSDFS_VH(env->sbi.vh_buf);
+		magic_valid = is_ssdfs_magic_valid(&vh->magic);
+
+		if (err == -ENODATA)
+			goto finish_search;
+		else if (err) {
+			SSDFS_ERR("fail to read peb %llu\n",
+				  peb_id);
+			goto finish_search;
+		} else {
+			u64 new_leb_id;
+			u64 new_peb_id;
+
+			new_leb_id =
+				SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf),
+						  SSDFS_CUR_SB_SEG);
+			new_peb_id =
+				SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf),
+						  SSDFS_CUR_SB_SEG);
+
+			if (new_leb_id != leb_id || new_peb_id != peb_id) {
+				err = -ENODATA;
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("SB segment not found: "
+					  "peb %llu\n",
+					  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+				goto finish_search;
+			}
+
+			env->sbi.last_log.leb_id = leb_id;
+			env->sbi.last_log.peb_id = peb_id;
+			env->sbi.last_log.page_offset = 0;
+			env->sbi.last_log.pages_count =
+				SSDFS_LOG_PAGES(env->sbi.vh_buf);
+
+			if (IS_SB_PEB(env)) {
+				if (is_cur_main_sb_peb_exhausted(env)) {
+					err = -ENOENT;
+#ifdef CONFIG_SSDFS_DEBUG
+					SSDFS_DBG("peb %llu is exhausted\n",
+						  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+					goto try_next_sb_peb;
+				} else {
+					err = 0;
+					goto finish_search;
+				}
+			} else {
+				err = -ENODATA;
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("SB segment not found: "
+					  "peb %llu\n",
+					  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+				goto finish_search;
+			}
+		}
+
+try_next_sb_peb:
+		if (kthread_should_stop()) {
+			err = -ENODATA;
+			goto finish_search;
+		}
+
+		leb_id = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi_backup.vh_buf),
+						SSDFS_NEXT_SB_SEG);
+		peb_id = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi_backup.vh_buf),
+						SSDFS_NEXT_SB_SEG);
+	} while (magic_valid);
+
+finish_search:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("search outside fragment is finished: "
+		  "err %d\n", err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+static
+int ssdfs_check_cur_main_sb_peb(struct ssdfs_recovery_env *env)
+{
+	struct ssdfs_volume_header *vh;
+	u64 leb_id;
+	u64 peb_id;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env);
+	BUG_ON(!env->sbi.vh_buf);
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	vh = SSDFS_VH(env->sbi.vh_buf);
+	leb_id = SSDFS_MAIN_SB_LEB(vh, SSDFS_CUR_SB_SEG);
+	peb_id = SSDFS_MAIN_SB_PEB(vh, SSDFS_CUR_SB_SEG);
+
+	ssdfs_backup_sb_info2(env);
+
+	err = ssdfs_read_sb_peb_checked(env, peb_id);
+	if (err == -ENODATA)
+		goto finish_check;
+	else if (err) {
+		SSDFS_ERR("fail to read peb %llu\n",
+			  peb_id);
+		goto finish_check;
+	} else {
+		u64 new_leb_id;
+		u64 new_peb_id;
+
+		vh = SSDFS_VH(env->sbi.vh_buf);
+		new_leb_id = SSDFS_MAIN_SB_LEB(vh, SSDFS_CUR_SB_SEG);
+		new_peb_id = SSDFS_MAIN_SB_PEB(vh, SSDFS_CUR_SB_SEG);
+
+		if (new_leb_id != leb_id || new_peb_id != peb_id) {
+			err = -ENODATA;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("SB segment not found: "
+				  "peb %llu\n",
+				  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto finish_check;
+		}
+
+		env->sbi.last_log.leb_id = leb_id;
+		env->sbi.last_log.peb_id = peb_id;
+		env->sbi.last_log.page_offset = 0;
+		env->sbi.last_log.pages_count =
+			SSDFS_LOG_PAGES(env->sbi.vh_buf);
+
+		if (IS_SB_PEB(env)) {
+			if (is_cur_main_sb_peb_exhausted(env)) {
+				err = -ENOENT;
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("peb %llu is exhausted\n",
+					  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+				goto finish_check;
+			} else {
+				err = 0;
+				goto finish_check;
+			}
+		} else {
+			err = -ENODATA;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("SB segment not found: "
+				  "peb %llu\n",
+				  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto finish_check;
+		}
+	}
+
+finish_check:
+	return err;
+}
+
+static
+int ssdfs_check_cur_copy_sb_peb(struct ssdfs_recovery_env *env)
+{
+	struct ssdfs_volume_header *vh;
+	u64 leb_id;
+	u64 peb_id;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env);
+	BUG_ON(!env->sbi.vh_buf);
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	vh = SSDFS_VH(env->sbi.vh_buf);
+	leb_id = SSDFS_COPY_SB_LEB(vh, SSDFS_CUR_SB_SEG);
+	peb_id = SSDFS_COPY_SB_PEB(vh, SSDFS_CUR_SB_SEG);
+
+	ssdfs_backup_sb_info2(env);
+
+	err = ssdfs_read_sb_peb_checked(env, peb_id);
+	if (err == -ENODATA)
+		goto finish_check;
+	else if (err) {
+		SSDFS_ERR("fail to read peb %llu\n",
+			  peb_id);
+		goto finish_check;
+	} else {
+		u64 new_leb_id;
+		u64 new_peb_id;
+
+		vh = SSDFS_VH(env->sbi.vh_buf);
+		new_leb_id = SSDFS_COPY_SB_LEB(vh, SSDFS_CUR_SB_SEG);
+		new_peb_id = SSDFS_COPY_SB_PEB(vh, SSDFS_CUR_SB_SEG);
+
+		if (new_leb_id != leb_id || new_peb_id != peb_id) {
+			err = -ENODATA;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("SB segment not found: "
+				  "peb %llu\n",
+				  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto finish_check;
+		}
+
+		env->sbi.last_log.leb_id = leb_id;
+		env->sbi.last_log.peb_id = peb_id;
+		env->sbi.last_log.page_offset = 0;
+		env->sbi.last_log.pages_count =
+			SSDFS_LOG_PAGES(env->sbi.vh_buf);
+
+		if (IS_SB_PEB(env)) {
+			if (is_cur_copy_sb_peb_exhausted(env)) {
+				err = -ENOENT;
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("peb %llu is exhausted\n",
+					  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+				goto finish_check;
+			} else {
+				err = 0;
+				goto finish_check;
+			}
+		} else {
+			err = -ENODATA;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("SB segment not found: "
+				  "peb %llu\n",
+				  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto finish_check;
+		}
+	}
+
+finish_check:
+	return err;
+}
+
+static
+int ssdfs_find_last_sb_seg_inside_fragment(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi || !env->fsi->sb);
+	BUG_ON(!env->sbi.vh_buf);
+	BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic));
+	BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size));
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+try_next_peb:
+	if (kthread_should_stop()) {
+		err = -ENODATA;
+		goto finish_search;
+	}
+
+	err = ssdfs_check_cur_main_sb_peb(env);
+	if (err == -ENODATA)
+		goto try_cur_copy_sb_peb;
+	else if (err == -ENOENT)
+		goto check_next_sb_pebs_pair;
+	else if (err)
+		goto finish_search;
+	else
+		goto finish_search;
+
+try_cur_copy_sb_peb:
+	if (kthread_should_stop()) {
+		err = -ENODATA;
+		goto finish_search;
+	}
+
+	err = ssdfs_check_cur_copy_sb_peb(env);
+	if (err == -ENODATA || err == -ENOENT)
+		goto check_next_sb_pebs_pair;
+	else if (err)
+		goto finish_search;
+	else
+		goto finish_search;
+
+check_next_sb_pebs_pair:
+	if (kthread_should_stop()) {
+		err = -ENODATA;
+		goto finish_search;
+	}
+
+	err = ssdfs_check_next_sb_pebs_pair(env);
+	if (err == -E2BIG) {
+		err = ssdfs_find_last_sb_seg_outside_fragment(env);
+		if (err == -ENODATA || err == -ENOENT) {
+			/* unable to find anything */
+			goto check_reserved_sb_pebs_pair;
+		} else if (err) {
+			SSDFS_ERR("search outside fragment has failed: "
+				  "err %d\n", err);
+			goto finish_search;
+		} else
+			goto finish_search;
+	} else if (!err)
+		goto try_next_peb;
+
+check_reserved_sb_pebs_pair:
+	if (kthread_should_stop()) {
+		err = -ENODATA;
+		goto finish_search;
+	}
+
+	err = ssdfs_check_reserved_sb_pebs_pair(env);
+	if (err == -E2BIG) {
+		err = ssdfs_find_last_sb_seg_outside_fragment(env);
+		if (err == -ENODATA || err == -ENOENT) {
+			/* unable to find anything */
+			goto finish_search;
+		} else if (err) {
+			SSDFS_ERR("search outside fragment has failed: "
+				  "err %d\n", err);
+			goto finish_search;
+		} else
+			goto finish_search;
+	} else if (!err)
+		goto try_next_peb;
+
+finish_search:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("search inside fragment is finished: "
+		  "err %d\n", err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+static
+int ssdfs_find_last_sb_seg_starting_from_peb(struct ssdfs_recovery_env *env,
+					     struct ssdfs_found_peb *ptr)
+{
+	struct super_block *sb;
+	struct ssdfs_volume_header *vh;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	size_t vh_size = sizeof(struct ssdfs_volume_header);
+	u64 offset;
+	u64 threshold_peb;
+	u64 peb_id;
+	u64 cno = U64_MAX;
+	bool magic_valid = false;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found || !env->fsi || !env->fsi->sb);
+	BUG_ON(!env->sbi.vh_buf);
+	BUG_ON(!env->fsi->devops->read);
+	BUG_ON(!ptr);
+	BUG_ON(ptr->peb_id >= U64_MAX);
+
+	SSDFS_DBG("peb_id %llu, start_peb %llu, pebs_count %u\n",
+		  ptr->peb_id,
+		  env->found->start_peb,
+		  env->found->pebs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	sb = env->fsi->sb;
+	threshold_peb = env->found->start_peb + env->found->pebs_count;
+	peb_id = ptr->peb_id;
+	offset = peb_id * env->fsi->erasesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("peb_id %llu, offset %llu\n",
+		  peb_id, offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = env->fsi->devops->read(sb, offset, hdr_size,
+				     env->sbi.vh_buf);
+	vh = SSDFS_VH(env->sbi.vh_buf);
+	magic_valid = is_ssdfs_magic_valid(&vh->magic);
+
+	if (err || !magic_valid) {
+		ssdfs_restore_sb_info2(env);
+		ptr->state = SSDFS_FOUND_PEB_INVALID;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("peb %llu is corrupted\n",
+			  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		if (ptr->peb_id >= env->found->start_peb &&
+		    ptr->peb_id < threshold_peb) {
+			/* try again */
+			return -EAGAIN;
+		} else {
+			/* PEB is out of range */
+			return -E2BIG;
+		}
+	} else {
+		ssdfs_memcpy(&env->last_vh, 0, vh_size,
+			     env->sbi.vh_buf, 0, vh_size,
+			     vh_size);
+		ssdfs_backup_sb_info2(env);
+		cno = le64_to_cpu(SSDFS_SEG_HDR(env->sbi.vh_buf)->cno);
+		ptr->cno = cno;
+		ptr->is_superblock_peb = IS_SB_PEB(env);
+		ptr->state = SSDFS_FOUND_PEB_VALID;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("peb_id %llu, cno %llu, is_superblock_peb %#x\n",
+			  peb_id, cno, ptr->is_superblock_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+	if (ptr->peb_id >= env->found->start_peb &&
+	    ptr->peb_id < threshold_peb) {
+		err = ssdfs_find_last_sb_seg_inside_fragment(env);
+		if (err == -ENODATA || err == -ENOENT) {
+			ssdfs_restore_sb_info2(env);
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("nothing has been found inside fragment: "
+				  "peb_id %llu\n",
+				  ptr->peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return -EAGAIN;
+		} else if (err) {
+			SSDFS_ERR("search inside fragment has failed: "
+				  "err %d\n", err);
+			return err;
+		}
+	} else {
+		err = ssdfs_find_last_sb_seg_outside_fragment(env);
+		if (err == -ENODATA || err == -ENOENT) {
+			ssdfs_restore_sb_info2(env);
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("nothing has been found outside fragment: "
+				  "peb_id %llu\n",
+				  ptr->peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return -E2BIG;
+		} else if (err) {
+			SSDFS_ERR("search outside fragment has failed: "
+				  "err %d\n", err);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+static
+int ssdfs_find_last_sb_seg_for_protected_peb(struct ssdfs_recovery_env *env)
+{
+	struct super_block *sb;
+	struct ssdfs_found_protected_peb *protected_peb;
+	struct ssdfs_found_peb *cur_peb;
+	u64 dev_size;
+	u64 threshold_peb;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found || !env->fsi || !env->fsi->sb);
+	BUG_ON(!env->sbi.vh_buf);
+	BUG_ON(!env->fsi->devops->read);
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	sb = env->fsi->sb;
+	dev_size = env->fsi->devops->device_size(env->fsi->sb);
+	threshold_peb = env->found->start_peb + env->found->pebs_count;
+
+	protected_peb = &env->found->array[SSDFS_LAST_CNO_PEB_INDEX];
+
+	if (protected_peb->peb.peb_id >= U64_MAX) {
+		SSDFS_ERR("protected hasn't been found\n");
+		return -ERANGE;
+	}
+
+	cur_peb = CUR_MAIN_SB_PEB(&protected_peb->found);
+	if (cur_peb->peb_id >= U64_MAX) {
+		SSDFS_ERR("peb_id is invalid\n");
+		return -ERANGE;
+	}
+
+	err = ssdfs_find_last_sb_seg_starting_from_peb(env, cur_peb);
+	if (err == -EAGAIN || err == -E2BIG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("nothing was found for peb %llu\n",
+			  cur_peb->peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		/* continue search */
+	} else if (err) {
+		SSDFS_ERR("fail to find last superblock segment: "
+			  "err %d\n", err);
+		goto finish_search;
+	} else
+		goto finish_search;
+
+	cur_peb = CUR_COPY_SB_PEB(&protected_peb->found);
+	if (cur_peb->peb_id >= U64_MAX) {
+		SSDFS_ERR("peb_id is invalid\n");
+		return -ERANGE;
+	}
+
+	err = ssdfs_find_last_sb_seg_starting_from_peb(env, cur_peb);
+	if (err == -EAGAIN || err == -E2BIG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("nothing was found for peb %llu\n",
+			  cur_peb->peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		/* continue search */
+	} else if (err) {
+		SSDFS_ERR("fail to find last superblock segment: "
+			  "err %d\n", err);
+		goto finish_search;
+	} else
+		goto finish_search;
+
+	cur_peb = NEXT_MAIN_SB_PEB(&protected_peb->found);
+	if (cur_peb->peb_id >= U64_MAX) {
+		SSDFS_ERR("peb_id is invalid\n");
+		return -ERANGE;
+	}
+
+	err = ssdfs_find_last_sb_seg_starting_from_peb(env, cur_peb);
+	if (err == -EAGAIN || err == -E2BIG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("nothing was found for peb %llu\n",
+			  cur_peb->peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		/* continue search */
+	} else if (err) {
+		SSDFS_ERR("fail to find last superblock segment: "
+			  "err %d\n", err);
+		goto finish_search;
+	} else
+		goto finish_search;
+
+	cur_peb = NEXT_COPY_SB_PEB(&protected_peb->found);
+	if (cur_peb->peb_id >= U64_MAX) {
+		SSDFS_ERR("peb_id is invalid\n");
+		return -ERANGE;
+	}
+
+	err = ssdfs_find_last_sb_seg_starting_from_peb(env, cur_peb);
+	if (err == -EAGAIN || err == -E2BIG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("nothing was found for peb %llu\n",
+			  cur_peb->peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		/* continue search */
+	} else if (err) {
+		SSDFS_ERR("fail to find last superblock segment: "
+			  "err %d\n", err);
+		goto finish_search;
+	} else
+		goto finish_search;
+
+	cur_peb = RESERVED_MAIN_SB_PEB(&protected_peb->found);
+	if (cur_peb->peb_id >= U64_MAX) {
+		SSDFS_ERR("peb_id is invalid\n");
+		return -ERANGE;
+	}
+
+	err = ssdfs_find_last_sb_seg_starting_from_peb(env, cur_peb);
+	if (err == -EAGAIN || err == -E2BIG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("nothing was found for peb %llu\n",
+			  cur_peb->peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		/* continue search */
+	} else if (err) {
+		SSDFS_ERR("fail to find last superblock segment: "
+			  "err %d\n", err);
+		goto finish_search;
+	} else
+		goto finish_search;
+
+	cur_peb = RESERVED_COPY_SB_PEB(&protected_peb->found);
+	if (cur_peb->peb_id >= U64_MAX) {
+		SSDFS_ERR("peb_id is invalid\n");
+		return -ERANGE;
+	}
+
+	err = ssdfs_find_last_sb_seg_starting_from_peb(env, cur_peb);
+	if (err == -EAGAIN || err == -E2BIG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("nothing was found for peb %llu\n",
+			  cur_peb->peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto finish_search;
+	} else if (err) {
+		SSDFS_ERR("fail to find last superblock segment: "
+			  "err %d\n", err);
+		goto finish_search;
+	} else
+		goto finish_search;
+
+finish_search:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("search is finished: "
+		  "err %d\n", err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+static
+int ssdfs_recovery_protected_section_fast_search(struct ssdfs_recovery_env *env)
+{
+	u64 threshold_peb;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi);
+	BUG_ON(!env->sbi.vh_buf);
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	threshold_peb = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / env->fsi->erasesize;
+
+	err = ssdfs_find_any_valid_sb_segment2(env, threshold_peb);
+	if (err)
+		return err;
+
+	if (kthread_should_stop())
+		return -ENOENT;
+
+	err = ssdfs_find_latest_valid_sb_segment2(env);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int ssdfs_recovery_try_fast_search(struct ssdfs_recovery_env *env)
+{
+	struct ssdfs_found_protected_peb *found;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found);
+	BUG_ON(!env->sbi.vh_buf);
+
+	SSDFS_DBG("env %p, start_peb %llu, pebs_count %u\n",
+		  env, env->found->start_peb,
+		  env->found->pebs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_find_valid_protected_pebs(env);
+	if (err == -ENODATA) {
+		found = &env->found->array[SSDFS_LOWER_PEB_INDEX];
+
+		if (found->peb.peb_id >= U64_MAX) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("no valid protected PEBs in fragment: "
+				  "start_peb %llu, pebs_count %u\n",
+				  env->found->start_peb,
+				  env->found->pebs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto finish_fast_search;
+		} else {
+			/* search only in the last valid section */
+			err = ssdfs_recovery_protected_section_fast_search(env);
+			goto finish_fast_search;
+		}
+	} else if (err) {
+		SSDFS_ERR("fail to find protected PEBs: "
+			  "start_peb %llu, pebs_count %u, err %d\n",
+			  env->found->start_peb,
+			  env->found->pebs_count, err);
+		goto finish_fast_search;
+	}
+
+	err = ssdfs_find_last_sb_seg_for_protected_peb(env);
+	if (err == -EAGAIN) {
+		*SSDFS_RECOVERY_CUR_OFF_PTR(env) = env->found->middle_offset;
+		err = ssdfs_recovery_protected_section_fast_search(env);
+		if (err == -ENODATA || err == -E2BIG) {
+			SSDFS_DBG("SEARCH FINISHED: "
+				  "nothing was found\n");
+			goto finish_fast_search;
+		} else if (err) {
+			SSDFS_ERR("fail to find last SB segment: "
+				  "err %d\n", err);
+			goto finish_fast_search;
+		}
+	} else if (err == -ENODATA || err == -E2BIG) {
+			SSDFS_DBG("SEARCH FINISHED: "
+				  "nothing was found\n");
+			goto finish_fast_search;
+	} else if (err) {
+		SSDFS_ERR("fail to find last SB segment: "
+			  "err %d\n", err);
+		goto finish_fast_search;
+	}
+
+finish_fast_search:
+	return err;
+}
diff --git a/fs/ssdfs/recovery_slow_search.c b/fs/ssdfs/recovery_slow_search.c
new file mode 100644
index 000000000000..ca4d12b24ab3
--- /dev/null
+++ b/fs/ssdfs/recovery_slow_search.c
@@ -0,0 +1,585 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/recovery_slow_search.c - slow superblock search.
+ *
+ * Copyright (c) 2020-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/kthread.h>
+#include <linux/pagevec.h>
+#include <linux/blkdev.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+#include "page_array.h"
+#include "page_vector.h"
+#include "peb.h"
+#include "segment_bitmap.h"
+#include "peb_mapping_table.h"
+#include "recovery.h"
+
+#include <trace/events/ssdfs.h>
+
+int ssdfs_find_latest_valid_sb_segment2(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+#endif /* CONFIG_SSDFS_DEBUG */
+	struct ssdfs_volume_header *last_vh;
+	u64 dev_size;
+	u64 cur_main_sb_peb, cur_copy_sb_peb;
+	u64 start_peb, next_peb;
+	u64 start_offset;
+	u64 step;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi);
+	BUG_ON(!env->sbi.vh_buf);
+	BUG_ON(!env->fsi->devops->read);
+	BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic));
+	BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size));
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	dev_size = env->fsi->devops->device_size(env->fsi->sb);
+	step = env->fsi->erasesize;
+
+try_next_peb:
+	if (kthread_should_stop()) {
+		err = -ENODATA;
+		goto rollback_valid_vh;
+	}
+
+	last_vh = SSDFS_VH(env->sbi.vh_buf);
+	cur_main_sb_peb = SSDFS_MAIN_SB_PEB(last_vh, SSDFS_CUR_SB_SEG);
+	cur_copy_sb_peb = SSDFS_COPY_SB_PEB(last_vh, SSDFS_CUR_SB_SEG);
+
+	if (cur_main_sb_peb != env->sbi.last_log.peb_id &&
+	    cur_copy_sb_peb != env->sbi.last_log.peb_id) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("volume header is corrupted\n");
+		SSDFS_DBG("cur_main_sb_peb %llu, cur_copy_sb_peb %llu, "
+			  "read PEB %llu\n",
+			  cur_main_sb_peb, cur_copy_sb_peb,
+			  env->sbi.last_log.peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto continue_search;
+	}
+
+	if (cur_main_sb_peb == env->sbi.last_log.peb_id) {
+		if (!is_cur_main_sb_peb_exhausted(env))
+			goto end_search;
+	} else {
+		if (!is_cur_copy_sb_peb_exhausted(env))
+			goto end_search;
+	}
+
+	err = ssdfs_check_next_sb_pebs_pair(env);
+	if (err == -E2BIG)
+		goto continue_search;
+	else if (err == -ENODATA || err == -ENOENT)
+		goto check_reserved_sb_pebs_pair;
+	else if (!err)
+		goto try_next_peb;
+
+check_reserved_sb_pebs_pair:
+	if (kthread_should_stop()) {
+		err = -ENODATA;
+		goto rollback_valid_vh;
+	}
+
+	err = ssdfs_check_reserved_sb_pebs_pair(env);
+	if (err == -E2BIG || err == -ENODATA || err == -ENOENT)
+		goto continue_search;
+	else if (!err)
+		goto try_next_peb;
+
+continue_search:
+	if (kthread_should_stop()) {
+		err = -ENODATA;
+		goto rollback_valid_vh;
+	}
+
+	start_offset = *SSDFS_RECOVERY_CUR_OFF_PTR(env) + env->fsi->erasesize;
+	start_peb = start_offset / env->fsi->erasesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("start_peb %llu, start_offset %llu, "
+		  "end_offset %llu\n",
+		  start_peb, start_offset,
+		  SSDFS_RECOVERY_UPPER_OFF(env));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = __ssdfs_find_any_valid_volume_header2(env,
+						    start_offset,
+						    SSDFS_RECOVERY_UPPER_OFF(env),
+						    step);
+	if (err == -E2BIG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to find any valid header: "
+			  "peb_id %llu\n",
+			  start_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto end_search;
+	} else if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to find any valid header: "
+			  "peb_id %llu\n",
+			  start_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto rollback_valid_vh;
+	}
+
+	if (kthread_should_stop()) {
+		err = -ENODATA;
+		goto rollback_valid_vh;
+	}
+
+	if (*SSDFS_RECOVERY_CUR_OFF_PTR(env) >= U64_MAX) {
+		err = -ENODATA;
+		goto rollback_valid_vh;
+	}
+
+	next_peb = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / env->fsi->erasesize;
+
+	err = ssdfs_find_any_valid_sb_segment2(env, next_peb);
+	if (err == -E2BIG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to find any valid header: "
+			  "peb_id %llu\n",
+			  start_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto end_search;
+	} else if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to find any valid sb seg: "
+			  "peb_id %llu\n",
+			  next_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto rollback_valid_vh;
+	} else
+		goto try_next_peb;
+
+rollback_valid_vh:
+	ssdfs_restore_sb_info2(env);
+
+end_search:
+	return err;
+}
+
+static inline
+bool need_continue_search(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_off %llu, upper_off %llu\n",
+		  *SSDFS_RECOVERY_CUR_OFF_PTR(env),
+		  SSDFS_RECOVERY_UPPER_OFF(env));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return *SSDFS_RECOVERY_CUR_OFF_PTR(env) < SSDFS_RECOVERY_UPPER_OFF(env);
+}
+
+static
+int ssdfs_recovery_first_phase_slow_search(struct ssdfs_recovery_env *env)
+{
+	u64 threshold_peb;
+	u64 peb_id;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi);
+	BUG_ON(!env->sbi.vh_buf);
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+try_another_search:
+	if (kthread_should_stop()) {
+		err = -ENOENT;
+		goto finish_first_phase;
+	}
+
+	threshold_peb = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / env->fsi->erasesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_off %llu, threshold_peb %llu\n",
+		  *SSDFS_RECOVERY_CUR_OFF_PTR(env),
+		  threshold_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_find_any_valid_sb_segment2(env, threshold_peb);
+	if (err == -E2BIG) {
+		ssdfs_restore_sb_info2(env);
+		err = ssdfs_find_last_sb_seg_outside_fragment(env);
+		if (err == -ENODATA || err == -ENOENT) {
+			if (kthread_should_stop()) {
+				err = -ENOENT;
+				goto finish_first_phase;
+			}
+
+			if (need_continue_search(env)) {
+				ssdfs_restore_sb_info2(env);
+
+				peb_id = *SSDFS_RECOVERY_CUR_OFF_PTR(env) /
+							env->fsi->erasesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("cur_off %llu, peb %llu\n",
+					  *SSDFS_RECOVERY_CUR_OFF_PTR(env),
+					  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+				err = __ssdfs_find_any_valid_volume_header2(env,
+					    *SSDFS_RECOVERY_CUR_OFF_PTR(env),
+					    SSDFS_RECOVERY_UPPER_OFF(env),
+					    env->fsi->erasesize);
+				if (err) {
+					SSDFS_DBG("valid magic is not found\n");
+					goto finish_first_phase;
+				} else
+					goto try_another_search;
+			} else
+				goto finish_first_phase;
+		} else
+			goto finish_first_phase;
+	} else if (err == -ENODATA || err == -ENOENT) {
+		if (kthread_should_stop())
+			err = -ENOENT;
+		else
+			err = -EAGAIN;
+
+		goto finish_first_phase;
+	} else if (err)
+		goto finish_first_phase;
+
+	if (kthread_should_stop()) {
+		err = -ENOENT;
+		goto finish_first_phase;
+	}
+
+	err = ssdfs_find_latest_valid_sb_segment2(env);
+	if (err == -ENODATA || err == -ENOENT) {
+		if (kthread_should_stop()) {
+			err = -ENOENT;
+			goto finish_first_phase;
+		}
+
+		if (need_continue_search(env)) {
+			ssdfs_restore_sb_info2(env);
+
+			peb_id = *SSDFS_RECOVERY_CUR_OFF_PTR(env) /
+						env->fsi->erasesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("cur_off %llu, peb %llu\n",
+				  *SSDFS_RECOVERY_CUR_OFF_PTR(env),
+				  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			err = __ssdfs_find_any_valid_volume_header2(env,
+					*SSDFS_RECOVERY_CUR_OFF_PTR(env),
+					SSDFS_RECOVERY_UPPER_OFF(env),
+					env->fsi->erasesize);
+				if (err) {
+					SSDFS_DBG("valid magic is not found\n");
+					goto finish_first_phase;
+				} else
+					goto try_another_search;
+		} else
+			goto finish_first_phase;
+	}
+
+finish_first_phase:
+	return err;
+}
+
+static
+int ssdfs_recovery_second_phase_slow_search(struct ssdfs_recovery_env *env)
+{
+	u64 threshold_peb;
+	u64 peb_id;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi);
+	BUG_ON(!env->sbi.vh_buf);
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_second_slow_try_possible(env)) {
+		SSDFS_DBG("there is no room for second slow try\n");
+		return -EAGAIN;
+	}
+
+	SSDFS_RECOVERY_SET_SECOND_SLOW_TRY(env);
+
+try_another_search:
+	if (kthread_should_stop())
+		return -ENOENT;
+
+	peb_id = *SSDFS_RECOVERY_CUR_OFF_PTR(env) /
+				env->fsi->erasesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_off %llu, peb %llu\n",
+		  *SSDFS_RECOVERY_CUR_OFF_PTR(env),
+		  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = __ssdfs_find_any_valid_volume_header2(env,
+					*SSDFS_RECOVERY_CUR_OFF_PTR(env),
+					SSDFS_RECOVERY_UPPER_OFF(env),
+					env->fsi->erasesize);
+	if (err) {
+		SSDFS_DBG("valid magic is not detected\n");
+		return err;
+	}
+
+	if (kthread_should_stop())
+		return -ENOENT;
+
+	threshold_peb = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / env->fsi->erasesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_off %llu, threshold_peb %llu\n",
+		  *SSDFS_RECOVERY_CUR_OFF_PTR(env),
+		  threshold_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_find_any_valid_sb_segment2(env, threshold_peb);
+	if (err == -E2BIG) {
+		ssdfs_restore_sb_info2(env);
+		err = ssdfs_find_last_sb_seg_outside_fragment(env);
+		if (err == -ENODATA || err == -ENOENT) {
+			if (kthread_should_stop()) {
+				err = -ENOENT;
+				goto finish_second_phase;
+			}
+
+			if (need_continue_search(env)) {
+				ssdfs_restore_sb_info2(env);
+				goto try_another_search;
+			} else
+				goto finish_second_phase;
+		} else
+			goto finish_second_phase;
+	} else if (err == -ENODATA || err == -ENOENT) {
+		if (kthread_should_stop())
+			err = -ENOENT;
+		else
+			err = -EAGAIN;
+
+		goto finish_second_phase;
+	} else if (err)
+		goto finish_second_phase;
+
+	if (kthread_should_stop()) {
+		err = -ENOENT;
+		goto finish_second_phase;
+	}
+
+	err = ssdfs_find_latest_valid_sb_segment2(env);
+	if (err == -ENODATA || err == -ENOENT) {
+		if (kthread_should_stop()) {
+			err = -ENOENT;
+			goto finish_second_phase;
+		}
+
+		if (need_continue_search(env)) {
+			ssdfs_restore_sb_info2(env);
+			goto try_another_search;
+		} else
+			goto finish_second_phase;
+	}
+
+finish_second_phase:
+	return err;
+}
+
+static
+int ssdfs_recovery_third_phase_slow_search(struct ssdfs_recovery_env *env)
+{
+	u64 threshold_peb;
+	u64 peb_id;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi);
+	BUG_ON(!env->sbi.vh_buf);
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_third_slow_try_possible(env)) {
+		SSDFS_DBG("there is no room for third slow try\n");
+		return -ENODATA;
+	}
+
+	SSDFS_RECOVERY_SET_THIRD_SLOW_TRY(env);
+
+try_another_search:
+	if (kthread_should_stop())
+		return -ENOENT;
+
+	peb_id = *SSDFS_RECOVERY_CUR_OFF_PTR(env) /
+				env->fsi->erasesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_off %llu, peb %llu\n",
+		  *SSDFS_RECOVERY_CUR_OFF_PTR(env),
+		  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = __ssdfs_find_any_valid_volume_header2(env,
+					*SSDFS_RECOVERY_CUR_OFF_PTR(env),
+					SSDFS_RECOVERY_UPPER_OFF(env),
+					env->fsi->erasesize);
+	if (err) {
+		SSDFS_DBG("valid magic is not detected\n");
+		return err;
+	}
+
+	if (kthread_should_stop())
+		return -ENOENT;
+
+	threshold_peb = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / env->fsi->erasesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_off %llu, threshold_peb %llu\n",
+		  *SSDFS_RECOVERY_CUR_OFF_PTR(env),
+		  threshold_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_find_any_valid_sb_segment2(env, threshold_peb);
+	if (err == -E2BIG) {
+		ssdfs_restore_sb_info2(env);
+		err = ssdfs_find_last_sb_seg_outside_fragment(env);
+		if (err == -ENODATA || err == -ENOENT) {
+			if (kthread_should_stop()) {
+				err = -ENOENT;
+				goto finish_third_phase;
+			}
+
+			if (need_continue_search(env)) {
+				ssdfs_restore_sb_info2(env);
+				goto try_another_search;
+			} else
+				goto finish_third_phase;
+		} else
+			goto finish_third_phase;
+	}  else if (err)
+		goto finish_third_phase;
+
+	if (kthread_should_stop()) {
+		err = -ENOENT;
+		goto finish_third_phase;
+	}
+
+	err = ssdfs_find_latest_valid_sb_segment2(env);
+	if (err == -ENODATA || err == -ENOENT) {
+		if (kthread_should_stop()) {
+			err = -ENOENT;
+			goto finish_third_phase;
+		}
+
+		if (need_continue_search(env)) {
+			ssdfs_restore_sb_info2(env);
+			goto try_another_search;
+		} else
+			goto finish_third_phase;
+	}
+
+finish_third_phase:
+	return err;
+}
+
+int ssdfs_recovery_try_slow_search(struct ssdfs_recovery_env *env)
+{
+	struct ssdfs_found_protected_peb *protected_peb;
+	struct ssdfs_volume_header *vh;
+	size_t vh_size = sizeof(struct ssdfs_volume_header);
+	bool magic_valid = false;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found);
+	BUG_ON(!env->sbi.vh_buf);
+
+	SSDFS_DBG("env %p, start_peb %llu, pebs_count %u\n",
+		  env, env->found->start_peb, env->found->pebs_count);
+	SSDFS_DBG("env->lower_offset %llu, env->upper_offset %llu\n",
+		  env->found->lower_offset, env->found->upper_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	protected_peb = &env->found->array[SSDFS_LAST_CNO_PEB_INDEX];
+
+	if (protected_peb->peb.peb_id >= U64_MAX) {
+		SSDFS_DBG("fragment is empty\n");
+		return -ENODATA;
+	}
+
+	err = ssdfs_read_checked_sb_info3(env, protected_peb->peb.peb_id, 0);
+	vh = SSDFS_VH(env->sbi.vh_buf);
+	magic_valid = is_ssdfs_magic_valid(&vh->magic);
+
+	if (err || !magic_valid) {
+		err = -ENODATA;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("peb %llu is corrupted\n",
+			  protected_peb->peb.peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto finish_search;
+	} else {
+		ssdfs_memcpy(&env->last_vh, 0, vh_size,
+			     env->sbi.vh_buf, 0, vh_size,
+			     vh_size);
+		ssdfs_backup_sb_info2(env);
+	}
+
+	if (env->found->start_peb == 0)
+		env->found->lower_offset = SSDFS_RESERVED_VBR_SIZE;
+	else {
+		env->found->lower_offset =
+			env->found->start_peb * env->fsi->erasesize;
+	}
+
+	env->found->upper_offset = (env->found->start_peb +
+					env->found->pebs_count - 1);
+	env->found->upper_offset *= env->fsi->erasesize;
+
+	SSDFS_RECOVERY_SET_FIRST_SLOW_TRY(env);
+
+	err = ssdfs_recovery_first_phase_slow_search(env);
+	if (err == -EAGAIN || err == -E2BIG ||
+	    err == -ENODATA || err == -ENOENT) {
+		if (kthread_should_stop()) {
+			err = -ENOENT;
+			goto finish_search;
+		}
+
+		err = ssdfs_recovery_second_phase_slow_search(env);
+		if (err == -EAGAIN || err == -E2BIG ||
+		    err == -ENODATA || err == -ENOENT) {
+			if (kthread_should_stop()) {
+				err = -ENOENT;
+				goto finish_search;
+			}
+
+			err = ssdfs_recovery_third_phase_slow_search(env);
+		}
+	}
+
+finish_search:
+	return err;
+}
diff --git a/fs/ssdfs/recovery_thread.c b/fs/ssdfs/recovery_thread.c
new file mode 100644
index 000000000000..cd1424762059
--- /dev/null
+++ b/fs/ssdfs/recovery_thread.c
@@ -0,0 +1,1196 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/recovery_thread.c - recovery thread's logic.
+ *
+ * Copyright (c) 2019-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/kthread.h>
+#include <linux/pagevec.h>
+#include <linux/blkdev.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+#include "page_array.h"
+#include "page_vector.h"
+#include "peb.h"
+#include "segment_bitmap.h"
+#include "peb_mapping_table.h"
+#include "recovery.h"
+
+#include <trace/events/ssdfs.h>
+
+void ssdfs_backup_sb_info2(struct ssdfs_recovery_env *env)
+{
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env);
+	BUG_ON(!env->sbi.vh_buf || !env->sbi.vs_buf);
+	BUG_ON(!env->sbi_backup.vh_buf || !env->sbi_backup.vs_buf);
+
+	SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, "
+		  "page_offset %u, pages_count %u, "
+		  "volume state: free_pages %llu, timestamp %#llx, "
+		  "cno %#llx, fs_state %#x\n",
+		  env->sbi.last_log.leb_id,
+		  env->sbi.last_log.peb_id,
+		  env->sbi.last_log.page_offset,
+		  env->sbi.last_log.pages_count,
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->free_pages),
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->timestamp),
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->cno),
+		  le16_to_cpu(SSDFS_VS(env->sbi.vs_buf)->state));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_memcpy(env->sbi_backup.vh_buf, 0, hdr_size,
+		     env->sbi.vh_buf, 0, hdr_size,
+		     hdr_size);
+	ssdfs_memcpy(env->sbi_backup.vs_buf, 0, footer_size,
+		     env->sbi.vs_buf, 0, footer_size,
+		     footer_size);
+	ssdfs_memcpy(&env->sbi_backup.last_log,
+		     0, sizeof(struct ssdfs_peb_extent),
+		     &env->sbi.last_log,
+		     0, sizeof(struct ssdfs_peb_extent),
+		     sizeof(struct ssdfs_peb_extent));
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, "
+		  "page_offset %u, pages_count %u, "
+		  "volume state: free_pages %llu, timestamp %#llx, "
+		  "cno %#llx, fs_state %#x\n",
+		  env->sbi.last_log.leb_id,
+		  env->sbi.last_log.peb_id,
+		  env->sbi.last_log.page_offset,
+		  env->sbi.last_log.pages_count,
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->free_pages),
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->timestamp),
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->cno),
+		  le16_to_cpu(SSDFS_VS(env->sbi.vs_buf)->state));
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+void ssdfs_restore_sb_info2(struct ssdfs_recovery_env *env)
+{
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env);
+	BUG_ON(!env->sbi.vh_buf || !env->sbi.vs_buf);
+	BUG_ON(!env->sbi_backup.vh_buf || !env->sbi_backup.vs_buf);
+
+	SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, "
+		  "page_offset %u, pages_count %u, "
+		  "volume state: free_pages %llu, timestamp %#llx, "
+		  "cno %#llx, fs_state %#x\n",
+		  env->sbi.last_log.leb_id,
+		  env->sbi.last_log.peb_id,
+		  env->sbi.last_log.page_offset,
+		  env->sbi.last_log.pages_count,
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->free_pages),
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->timestamp),
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->cno),
+		  le16_to_cpu(SSDFS_VS(env->sbi.vs_buf)->state));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_memcpy(env->sbi.vh_buf, 0, hdr_size,
+		     env->sbi_backup.vh_buf, 0, hdr_size,
+		     hdr_size);
+	ssdfs_memcpy(env->sbi.vs_buf, 0, footer_size,
+		     env->sbi_backup.vs_buf, 0, footer_size,
+		     footer_size);
+	ssdfs_memcpy(&env->sbi.last_log,
+		     0, sizeof(struct ssdfs_peb_extent),
+		     &env->sbi_backup.last_log,
+		     0, sizeof(struct ssdfs_peb_extent),
+		     sizeof(struct ssdfs_peb_extent));
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, "
+		  "page_offset %u, pages_count %u, "
+		  "volume state: free_pages %llu, timestamp %#llx, "
+		  "cno %#llx, fs_state %#x\n",
+		  env->sbi.last_log.leb_id,
+		  env->sbi.last_log.peb_id,
+		  env->sbi.last_log.page_offset,
+		  env->sbi.last_log.pages_count,
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->free_pages),
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->timestamp),
+		  le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->cno),
+		  le16_to_cpu(SSDFS_VS(env->sbi.vs_buf)->state));
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+int ssdfs_read_checked_sb_info3(struct ssdfs_recovery_env *env,
+				u64 peb_id, u32 pages_off)
+{
+	u32 lf_off;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi);
+
+	SSDFS_DBG("env %p, peb_id %llu, pages_off %u\n",
+		  env, peb_id, pages_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_read_checked_segment_header(env->fsi, peb_id, pages_off,
+						env->sbi.vh_buf, true);
+	if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("volume header is corrupted: "
+			  "peb_id %llu, offset %d, err %d\n",
+			  peb_id, pages_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return err;
+	}
+
+	lf_off = SSDFS_LOG_FOOTER_OFF(env->sbi.vh_buf);
+
+	err = ssdfs_read_checked_log_footer(env->fsi,
+					    SSDFS_SEG_HDR(env->sbi.vh_buf),
+					    peb_id, lf_off, env->sbi.vs_buf,
+					    true);
+	if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("log footer is corrupted: "
+			  "peb_id %llu, offset %d, err %d\n",
+			  peb_id, lf_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return err;
+	}
+
+	return 0;
+}
+
+static inline
+int ssdfs_read_and_check_volume_header(struct ssdfs_recovery_env *env,
+					u64 offset)
+{
+	struct super_block *sb;
+	struct ssdfs_volume_header *vh;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	u64 dev_size;
+	bool magic_valid, crc_valid, hdr_consistent;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi);
+	BUG_ON(!env->fsi->devops->read);
+
+	SSDFS_DBG("env %p, offset %llu\n",
+		  env, offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	sb = env->fsi->sb;
+	dev_size = env->fsi->devops->device_size(sb);
+
+	err = env->fsi->devops->read(sb, offset, hdr_size,
+				     env->sbi.vh_buf);
+	if (err)
+		goto found_corrupted_peb;
+
+	err = -ENODATA;
+
+	vh = SSDFS_VH(env->sbi.vh_buf);
+	magic_valid = is_ssdfs_magic_valid(&vh->magic);
+	if (magic_valid) {
+		crc_valid = is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf,
+								hdr_size);
+		hdr_consistent = is_ssdfs_volume_header_consistent(env->fsi, vh,
+								   dev_size);
+
+		if (crc_valid && hdr_consistent) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("found offset %llu\n",
+				  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return 0;
+		}
+	}
+
+found_corrupted_peb:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("peb %llu (offset %llu) is corrupted\n",
+		  offset / env->fsi->erasesize, offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+int __ssdfs_find_any_valid_volume_header2(struct ssdfs_recovery_env *env,
+					  u64 start_offset,
+					  u64 end_offset,
+					  u64 step)
+{
+	u64 dev_size;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi);
+	BUG_ON(!env->fsi->devops->read);
+
+	SSDFS_DBG("env %p, start_offset %llu, "
+		  "end_offset %llu, step %llu\n",
+		  env, start_offset, end_offset, step);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	dev_size = env->fsi->devops->device_size(env->fsi->sb);
+	end_offset = min_t(u64, dev_size, end_offset);
+
+	*SSDFS_RECOVERY_CUR_OFF_PTR(env) = start_offset;
+
+	if (start_offset >= end_offset) {
+		err = -E2BIG;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("start_offset %llu, end_offset %llu, err %d\n",
+			  start_offset, end_offset, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return err;
+	}
+
+	while (*SSDFS_RECOVERY_CUR_OFF_PTR(env) < end_offset) {
+		if (kthread_should_stop())
+			return -ENOENT;
+
+		err = ssdfs_read_and_check_volume_header(env,
+					*SSDFS_RECOVERY_CUR_OFF_PTR(env));
+		if (!err) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("found offset %llu\n",
+				  *SSDFS_RECOVERY_CUR_OFF_PTR(env));
+#endif /* CONFIG_SSDFS_DEBUG */
+			return 0;
+		}
+
+		*SSDFS_RECOVERY_CUR_OFF_PTR(env) += step;
+	}
+
+	return -E2BIG;
+}
+
+int ssdfs_find_any_valid_sb_segment2(struct ssdfs_recovery_env *env,
+				     u64 threshold_peb)
+{
+	size_t vh_size = sizeof(struct ssdfs_volume_header);
+	struct ssdfs_volume_header *vh;
+	struct ssdfs_segment_header *seg_hdr;
+	u64 dev_size;
+	u64 start_peb;
+	loff_t start_offset, next_offset;
+	u64 last_cno, cno;
+	__le64 peb1, peb2;
+	__le64 leb1, leb2;
+	u64 checked_pebs[SSDFS_SB_CHAIN_MAX][SSDFS_SB_SEG_COPY_MAX];
+	u64 step;
+	int i, j;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found || !env->fsi);
+	BUG_ON(!env->fsi->devops->read);
+
+	SSDFS_DBG("env %p, start_peb %llu, "
+		  "pebs_count %u, threshold_peb %llu\n",
+		  env, env->found->start_peb,
+		  env->found->pebs_count, threshold_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	dev_size = env->fsi->devops->device_size(env->fsi->sb);
+	step = env->fsi->erasesize;
+
+	start_peb = max_t(u64,
+			*SSDFS_RECOVERY_CUR_OFF_PTR(env) / env->fsi->erasesize,
+			threshold_peb);
+	start_offset = start_peb * env->fsi->erasesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("start_peb %llu, start_offset %llu, "
+		  "end_offset %llu\n",
+		  start_peb, start_offset,
+		  SSDFS_RECOVERY_UPPER_OFF(env));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*SSDFS_RECOVERY_CUR_OFF_PTR(env) = start_offset;
+
+	if (start_offset >= SSDFS_RECOVERY_UPPER_OFF(env)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("start_offset %llu >= end_offset %llu\n",
+			  start_offset, SSDFS_RECOVERY_UPPER_OFF(env));
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -E2BIG;
+	}
+
+	i = SSDFS_SB_CHAIN_MAX;
+	memset(checked_pebs, 0xFF,
+		(SSDFS_SB_CHAIN_MAX * sizeof(u64)) +
+		(SSDFS_SB_SEG_COPY_MAX * sizeof(u64)));
+
+try_next_volume_portion:
+	ssdfs_memcpy(&env->last_vh, 0, vh_size,
+		     env->sbi.vh_buf, 0, vh_size,
+		     vh_size);
+	last_cno = le64_to_cpu(SSDFS_SEG_HDR(env->sbi.vh_buf)->cno);
+
+try_again:
+	if (kthread_should_stop())
+		return -ENODATA;
+
+	switch (i) {
+	case SSDFS_SB_CHAIN_MAX:
+		i = SSDFS_CUR_SB_SEG;
+		break;
+
+	case SSDFS_CUR_SB_SEG:
+		i = SSDFS_NEXT_SB_SEG;
+		break;
+
+	case SSDFS_NEXT_SB_SEG:
+		i = SSDFS_RESERVED_SB_SEG;
+		break;
+
+	default:
+		start_offset = (threshold_peb * env->fsi->erasesize) + step;
+		start_offset = max_t(u64, start_offset,
+				     *SSDFS_RECOVERY_CUR_OFF_PTR(env) + step);
+		*SSDFS_RECOVERY_CUR_OFF_PTR(env) = start_offset;
+		err = __ssdfs_find_any_valid_volume_header2(env, start_offset,
+					SSDFS_RECOVERY_UPPER_OFF(env), step);
+		if (!err) {
+			i = SSDFS_SB_CHAIN_MAX;
+			threshold_peb = *SSDFS_RECOVERY_CUR_OFF_PTR(env);
+			threshold_peb /= env->fsi->erasesize;
+			goto try_next_volume_portion;
+		}
+
+		/* the fragment is checked completely */
+		return err;
+	}
+
+	err = -ENODATA;
+
+	for (j = SSDFS_MAIN_SB_SEG; j < SSDFS_SB_SEG_COPY_MAX; j++) {
+		u64 leb_id = le64_to_cpu(env->last_vh.sb_pebs[i][j].leb_id);
+		u64 peb_id = le64_to_cpu(env->last_vh.sb_pebs[i][j].peb_id);
+		u16 seg_type;
+		u32 erasesize = env->fsi->erasesize;
+
+		if (kthread_should_stop())
+			return -ENODATA;
+
+		if (peb_id == U64_MAX || leb_id == U64_MAX) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("invalid peb_id %llu, leb_id %llu, "
+				  "sb_chain %d, sb_copy %d\n",
+				  leb_id, peb_id, i, j);
+#endif /* CONFIG_SSDFS_DEBUG */
+			continue;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("leb_id %llu, peb_id %llu, "
+			  "checked_peb %llu, threshold_peb %llu\n",
+			  leb_id, peb_id,
+			  checked_pebs[i][j],
+			  threshold_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (checked_pebs[i][j] == peb_id)
+			continue;
+		else
+			checked_pebs[i][j] = peb_id;
+
+		next_offset = peb_id * erasesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("peb_id %llu, next_offset %llu, "
+			  "cur_offset %llu, end_offset %llu\n",
+			  peb_id, next_offset,
+			  *SSDFS_RECOVERY_CUR_OFF_PTR(env),
+			  SSDFS_RECOVERY_UPPER_OFF(env));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (next_offset >= SSDFS_RECOVERY_UPPER_OFF(env)) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find valid SB segment: "
+				  "next_offset %llu >= end_offset %llu\n",
+				  next_offset,
+				  SSDFS_RECOVERY_UPPER_OFF(env));
+#endif /* CONFIG_SSDFS_DEBUG */
+			continue;
+		}
+
+		if ((env->found->start_peb * erasesize) > next_offset) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find valid SB segment: "
+				  "next_offset %llu >= start_offset %llu\n",
+				  next_offset,
+				  env->found->start_peb * erasesize);
+#endif /* CONFIG_SSDFS_DEBUG */
+			continue;
+		}
+
+		err = ssdfs_read_checked_sb_info3(env, peb_id, 0);
+		if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("peb_id %llu is corrupted: err %d\n",
+				  peb_id, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			continue;
+		}
+
+		env->sbi.last_log.leb_id = leb_id;
+		env->sbi.last_log.peb_id = peb_id;
+		env->sbi.last_log.page_offset = 0;
+		env->sbi.last_log.pages_count =
+			SSDFS_LOG_PAGES(env->sbi.vh_buf);
+
+		seg_hdr = SSDFS_SEG_HDR(env->sbi.vh_buf);
+		seg_type = SSDFS_SEG_TYPE(seg_hdr);
+
+		if (seg_type == SSDFS_SB_SEG_TYPE) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("PEB %llu has been found\n",
+				  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return 0;
+		} else {
+			err = -EIO;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("PEB %llu is not sb segment\n",
+				  peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		if (!err)
+			goto compare_vh_info;
+	}
+
+	if (err) {
+		ssdfs_memcpy(env->sbi.vh_buf, 0, vh_size,
+			     &env->last_vh, 0, vh_size,
+			     vh_size);
+		goto try_again;
+	}
+
+compare_vh_info:
+	vh = SSDFS_VH(env->sbi.vh_buf);
+	seg_hdr = SSDFS_SEG_HDR(env->sbi.vh_buf);
+	leb1 = env->last_vh.sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].leb_id;
+	leb2 = vh->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].leb_id;
+	peb1 = env->last_vh.sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].peb_id;
+	peb2 = vh->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].peb_id;
+	cno = le64_to_cpu(seg_hdr->cno);
+
+	if (cno > last_cno && (leb1 != leb2 || peb1 != peb2)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("cno %llu, last_cno %llu, "
+			  "leb1 %llu, leb2 %llu, "
+			  "peb1 %llu, peb2 %llu\n",
+			  cno, last_cno,
+			  le64_to_cpu(leb1), le64_to_cpu(leb2),
+			  le64_to_cpu(peb1), le64_to_cpu(peb2));
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto try_again;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("unable to find any valid segment with superblocks chain\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+	return err;
+}
+
+static inline
+bool is_sb_peb_exhausted(struct ssdfs_recovery_env *env,
+			 u64 leb_id, u64 peb_id)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+#endif /* CONFIG_SSDFS_DEBUG */
+	struct ssdfs_peb_extent checking_page;
+	u64 pages_per_peb;
+	u16 sb_seg_log_pages;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi);
+	BUG_ON(!env->sbi.vh_buf);
+	BUG_ON(!env->fsi->devops->read);
+	BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic));
+	BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size));
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p, "
+		  "leb_id %llu, peb_id %llu\n",
+		  env, env->sbi.vh_buf,
+		  leb_id, peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	sb_seg_log_pages =
+		le16_to_cpu(SSDFS_VH(env->sbi.vh_buf)->sb_seg_log_pages);
+
+	if (!env->fsi->devops->can_write_page) {
+		SSDFS_CRIT("fail to find latest valid sb info: "
+			   "can_write_page is not supported\n");
+		return true;
+	}
+
+	if (leb_id >= U64_MAX || peb_id >= U64_MAX) {
+		SSDFS_ERR("invalid leb_id %llu or peb_id %llu\n",
+			  leb_id, peb_id);
+		return true;
+	}
+
+	if (env->fsi->is_zns_device) {
+		pages_per_peb = div64_u64(env->fsi->zone_capacity,
+					  env->fsi->pagesize);
+	} else
+		pages_per_peb = env->fsi->pages_per_peb;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(pages_per_peb >= U32_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	checking_page.leb_id = leb_id;
+	checking_page.peb_id = peb_id;
+	checking_page.page_offset = (u32)pages_per_peb - sb_seg_log_pages;
+	checking_page.pages_count = 1;
+
+	err = ssdfs_can_write_sb_log(env->fsi->sb, &checking_page);
+	if (!err)
+		return false;
+
+	return true;
+}
+
+bool is_cur_main_sb_peb_exhausted(struct ssdfs_recovery_env *env)
+{
+	u64 leb_id;
+	u64 peb_id;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi);
+	BUG_ON(!env->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	leb_id = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf),
+				   SSDFS_CUR_SB_SEG);
+	peb_id = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf),
+				   SSDFS_CUR_SB_SEG);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p, "
+		  "leb_id %llu, peb_id %llu\n",
+		  env, env->sbi.vh_buf,
+		  leb_id, peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return is_sb_peb_exhausted(env, leb_id, peb_id);
+}
+
+bool is_cur_copy_sb_peb_exhausted(struct ssdfs_recovery_env *env)
+{
+	u64 leb_id;
+	u64 peb_id;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi);
+	BUG_ON(!env->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	leb_id = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi.vh_buf),
+				   SSDFS_CUR_SB_SEG);
+	peb_id = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi.vh_buf),
+				   SSDFS_CUR_SB_SEG);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p, "
+		  "leb_id %llu, peb_id %llu\n",
+		  env, env->sbi.vh_buf,
+		  leb_id, peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return is_sb_peb_exhausted(env, leb_id, peb_id);
+}
+
+static
+int ssdfs_check_sb_segs_sequence(struct ssdfs_recovery_env *env)
+{
+	u16 seg_type;
+	u64 cno1, cno2;
+	u64 cur_peb, next_peb, prev_peb;
+	u64 cur_leb, next_leb, prev_leb;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->fsi);
+	BUG_ON(!env->sbi.vh_buf);
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	seg_type = SSDFS_SEG_TYPE(SSDFS_SEG_HDR(env->sbi.vh_buf));
+	if (seg_type != SSDFS_SB_SEG_TYPE) {
+		SSDFS_DBG("invalid segment type\n");
+		return -ENODATA;
+	}
+
+	cno1 = SSDFS_SEG_CNO(env->sbi_backup.vh_buf);
+	cno2 = SSDFS_SEG_CNO(env->sbi.vh_buf);
+	if (cno1 >= cno2) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("last cno %llu is not lesser than read cno %llu\n",
+			  cno1, cno2);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	}
+
+	next_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi_backup.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	cur_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (next_peb != cur_peb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("next_peb %llu doesn't equal to cur_peb %llu\n",
+			  next_peb, cur_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	}
+
+	prev_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_PREV_SB_SEG);
+	cur_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi_backup.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (prev_peb != cur_peb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("prev_peb %llu doesn't equal to cur_peb %llu\n",
+			  prev_peb, cur_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	}
+
+	next_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi_backup.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	cur_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (next_leb != cur_leb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("next_leb %llu doesn't equal to cur_leb %llu\n",
+			  next_leb, cur_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	}
+
+	prev_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_PREV_SB_SEG);
+	cur_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi_backup.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (prev_leb != cur_leb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("prev_leb %llu doesn't equal to cur_leb %llu\n",
+			  prev_leb, cur_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	}
+
+	next_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi_backup.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	cur_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (next_peb != cur_peb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("next_peb %llu doesn't equal to cur_peb %llu\n",
+			  next_peb, cur_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	}
+
+	prev_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_PREV_SB_SEG);
+	cur_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi_backup.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (prev_peb != cur_peb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("prev_peb %llu doesn't equal to cur_peb %llu\n",
+			  prev_peb, cur_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	}
+
+	next_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi_backup.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	cur_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (next_leb != cur_leb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("next_leb %llu doesn't equal to cur_leb %llu\n",
+			  next_leb, cur_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	}
+
+	prev_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_PREV_SB_SEG);
+	cur_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi_backup.vh_buf),
+					SSDFS_CUR_SB_SEG);
+	if (prev_leb != cur_leb) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("prev_leb %llu doesn't equal to cur_leb %llu\n",
+			  prev_leb, cur_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	}
+
+	return 0;
+}
+
+int ssdfs_check_next_sb_pebs_pair(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+#endif /* CONFIG_SSDFS_DEBUG */
+	u64 next_leb;
+	u64 next_peb;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found || !env->fsi);
+	BUG_ON(!env->sbi.vh_buf);
+	BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic));
+	BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size));
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p, "
+		  "env->start_peb %llu, env->pebs_count %u\n",
+		  env, env->sbi.vh_buf,
+		  env->found->start_peb, env->found->pebs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	next_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	next_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	if (next_leb == U64_MAX || next_peb == U64_MAX) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid next_leb %llu, next_peb %llu\n",
+			  next_leb, next_peb);
+		goto end_next_peb_check;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("MAIN: next_leb %llu, next_peb %llu\n",
+		  next_leb, next_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (next_peb < env->found->start_peb ||
+	    next_peb >= (env->found->start_peb + env->found->pebs_count)) {
+		err = -E2BIG;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("next_peb %llu, start_peb %llu, pebs_count %u\n",
+			  next_peb,
+			  env->found->start_peb,
+			  env->found->pebs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto end_next_peb_check;
+	}
+
+	ssdfs_backup_sb_info2(env);
+
+	err = ssdfs_read_checked_sb_info3(env, next_peb, 0);
+	if (!err) {
+		env->sbi.last_log.leb_id = next_leb;
+		env->sbi.last_log.peb_id = next_peb;
+		env->sbi.last_log.page_offset = 0;
+		env->sbi.last_log.pages_count =
+				SSDFS_LOG_PAGES(env->sbi.vh_buf);
+
+		err = ssdfs_check_sb_segs_sequence(env);
+		if (!err)
+			goto end_next_peb_check;
+	}
+
+	ssdfs_restore_sb_info2(env);
+	err = 0; /* try to read the backup copy */
+
+	next_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	next_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_NEXT_SB_SEG);
+	if (next_leb >= U64_MAX || next_peb >= U64_MAX) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid next_leb %llu, next_peb %llu\n",
+			  next_leb, next_peb);
+		goto end_next_peb_check;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("COPY: next_leb %llu, next_peb %llu\n",
+		  next_leb, next_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (next_peb < env->found->start_peb ||
+	    next_peb >= (env->found->start_peb + env->found->pebs_count)) {
+		err = -E2BIG;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("next_peb %llu, start_peb %llu, pebs_count %u\n",
+			  next_peb,
+			  env->found->start_peb,
+			  env->found->pebs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto end_next_peb_check;
+	}
+
+	err = ssdfs_read_checked_sb_info3(env, next_peb, 0);
+	if (!err) {
+		env->sbi.last_log.leb_id = next_leb;
+		env->sbi.last_log.peb_id = next_peb;
+		env->sbi.last_log.page_offset = 0;
+		env->sbi.last_log.pages_count =
+				SSDFS_LOG_PAGES(env->sbi.vh_buf);
+
+		err = ssdfs_check_sb_segs_sequence(env);
+		if (!err)
+			goto end_next_peb_check;
+	}
+
+	ssdfs_restore_sb_info2(env);
+
+end_next_peb_check:
+	return err;
+}
+
+int ssdfs_check_reserved_sb_pebs_pair(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+#endif /* CONFIG_SSDFS_DEBUG */
+	u64 reserved_leb;
+	u64 reserved_peb;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env || !env->found || !env->fsi);
+	BUG_ON(!env->sbi.vh_buf);
+	BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic));
+	BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size));
+
+	SSDFS_DBG("env %p, env->sbi.vh_buf %p, "
+		  "start_peb %llu, pebs_count %u\n",
+		  env, env->sbi.vh_buf,
+		  env->found->start_peb,
+		  env->found->pebs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	reserved_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_RESERVED_SB_SEG);
+	reserved_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_RESERVED_SB_SEG);
+	if (reserved_leb >= U64_MAX || reserved_peb >= U64_MAX) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid reserved_leb %llu, reserved_peb %llu\n",
+			  reserved_leb, reserved_peb);
+		goto end_reserved_peb_check;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("MAIN: reserved_leb %llu, reserved_peb %llu\n",
+		  reserved_leb, reserved_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (reserved_peb < env->found->start_peb ||
+	    reserved_peb >= (env->found->start_peb + env->found->pebs_count)) {
+		err = -E2BIG;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("reserved_peb %llu, start_peb %llu, pebs_count %u\n",
+			  reserved_peb,
+			  env->found->start_peb,
+			  env->found->pebs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto end_reserved_peb_check;
+	}
+
+	ssdfs_backup_sb_info2(env);
+
+	err = ssdfs_read_checked_sb_info3(env, reserved_peb, 0);
+	if (!err) {
+		env->sbi.last_log.leb_id = reserved_leb;
+		env->sbi.last_log.peb_id = reserved_peb;
+		env->sbi.last_log.page_offset = 0;
+		env->sbi.last_log.pages_count =
+				SSDFS_LOG_PAGES(env->sbi.vh_buf);
+		goto end_reserved_peb_check;
+	}
+
+	ssdfs_restore_sb_info2(env);
+	err = 0; /* try to read the backup copy */
+
+	reserved_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_RESERVED_SB_SEG);
+	reserved_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi.vh_buf),
+					SSDFS_RESERVED_SB_SEG);
+	if (reserved_leb >= U64_MAX || reserved_peb >= U64_MAX) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid reserved_leb %llu, reserved_peb %llu\n",
+			  reserved_leb, reserved_peb);
+		goto end_reserved_peb_check;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("COPY: reserved_leb %llu, reserved_peb %llu\n",
+		  reserved_leb, reserved_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (reserved_peb < env->found->start_peb ||
+	    reserved_peb >= (env->found->start_peb + env->found->pebs_count)) {
+		err = -E2BIG;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("reserved_peb %llu, start_peb %llu, pebs_count %u\n",
+			  reserved_peb,
+			  env->found->start_peb,
+			  env->found->pebs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto end_reserved_peb_check;
+	}
+
+	err = ssdfs_read_checked_sb_info3(env, reserved_peb, 0);
+	if (!err) {
+		env->sbi.last_log.leb_id = reserved_leb;
+		env->sbi.last_log.peb_id = reserved_peb;
+		env->sbi.last_log.page_offset = 0;
+		env->sbi.last_log.pages_count =
+				SSDFS_LOG_PAGES(env->sbi.vh_buf);
+		goto end_reserved_peb_check;
+	}
+
+	ssdfs_restore_sb_info2(env);
+
+end_reserved_peb_check:
+	return err;
+}
+
+static inline
+bool has_recovery_job(struct ssdfs_recovery_env *env)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return atomic_read(&env->state) == SSDFS_START_RECOVERY;
+}
+
+int ssdfs_recovery_thread_func(void *data);
+
+static
+struct ssdfs_thread_descriptor recovery_thread = {
+	.threadfn = ssdfs_recovery_thread_func,
+	.fmt = "ssdfs-recovery-%u",
+};
+
+#define RECOVERY_THREAD_WAKE_CONDITION(env) \
+	(kthread_should_stop() || has_recovery_job(env))
+
+/*
+ * ssdfs_recovery_thread_func() - main fuction of recovery thread
+ * @data: pointer on data object
+ *
+ * This function is main fuction of recovery thread.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ */
+int ssdfs_recovery_thread_func(void *data)
+{
+	struct ssdfs_recovery_env *env = data;
+	wait_queue_head_t *wait_queue;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	if (!env) {
+		SSDFS_ERR("pointer on environment is NULL\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("recovery thread: env %p\n", env);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	wait_queue = &env->request_wait_queue;
+
+repeat:
+	if (kthread_should_stop()) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("stop recovery thread: env %p\n", env);
+#endif /* CONFIG_SSDFS_DEBUG */
+		complete_all(&env->thread.full_stop);
+		return 0;
+	}
+
+	if (atomic_read(&env->state) != SSDFS_START_RECOVERY)
+		goto sleep_recovery_thread;
+
+	if (env->found->start_peb >= U64_MAX ||
+	    env->found->pebs_count >= U32_MAX) {
+		err = -EINVAL;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("invalid input: "
+			  "start_peb %llu, pebs_count %u\n",
+			  env->found->start_peb,
+			  env->found->pebs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto finish_recovery;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("start_peb %llu, pebs_count %u\n",
+		  env->found->start_peb,
+		  env->found->pebs_count);
+	SSDFS_DBG("search_phase %#x\n",
+		  env->found->search_phase);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (env->found->search_phase) {
+	case SSDFS_RECOVERY_FAST_SEARCH:
+		err = ssdfs_recovery_try_fast_search(env);
+		if (err) {
+			if (kthread_should_stop()) {
+				err = -ENOENT;
+				goto finish_recovery;
+			}
+		}
+		break;
+
+	case SSDFS_RECOVERY_SLOW_SEARCH:
+		err = ssdfs_recovery_try_slow_search(env);
+		if (err) {
+			if (kthread_should_stop()) {
+				err = -ENOENT;
+				goto finish_recovery;
+			}
+		}
+		break;
+
+	default:
+		err = -ERANGE;
+		SSDFS_ERR("search has not been requested: "
+			  "search_phase %#x\n",
+			  env->found->search_phase);
+		goto finish_recovery;
+	}
+
+finish_recovery:
+	env->err = err;
+
+	if (env->err)
+		atomic_set(&env->state, SSDFS_RECOVERY_FAILED);
+	else
+		atomic_set(&env->state, SSDFS_RECOVERY_FINISHED);
+
+	wake_up_all(&env->result_wait_queue);
+
+sleep_recovery_thread:
+	wait_event_interruptible(*wait_queue,
+				 RECOVERY_THREAD_WAKE_CONDITION(env));
+	goto repeat;
+}
+
+/*
+ * ssdfs_recovery_start_thread() - start recovery's thread
+ * @env: recovery environment
+ * @id: thread's ID
+ */
+int ssdfs_recovery_start_thread(struct ssdfs_recovery_env *env,
+				u32 id)
+{
+	ssdfs_threadfn threadfn;
+	const char *fmt;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env);
+
+	SSDFS_DBG("env %p, id %u\n", env, id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	threadfn = recovery_thread.threadfn;
+	fmt = recovery_thread.fmt;
+
+	env->thread.task = kthread_create(threadfn, env, fmt, id);
+	if (IS_ERR_OR_NULL(env->thread.task)) {
+		err = (env->thread.task == NULL ? -ENOMEM :
+						PTR_ERR(env->thread.task));
+		if (err == -EINTR) {
+			/*
+			 * Ignore this error.
+			 */
+		} else {
+			if (err == 0)
+				err = -ERANGE;
+			SSDFS_ERR("fail to start recovery thread: "
+				  "id %u, err %d\n", id, err);
+		}
+
+		return err;
+	}
+
+	init_waitqueue_head(&env->request_wait_queue);
+	init_waitqueue_entry(&env->thread.wait, env->thread.task);
+	add_wait_queue(&env->request_wait_queue, &env->thread.wait);
+	init_waitqueue_head(&env->result_wait_queue);
+	init_completion(&env->thread.full_stop);
+
+	wake_up_process(env->thread.task);
+
+	return 0;
+}
+
+/*
+ * ssdfs_recovery_stop_thread() - stop recovery thread
+ * @env: recovery environment
+ */
+int ssdfs_recovery_stop_thread(struct ssdfs_recovery_env *env)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env);
+
+	SSDFS_DBG("env %p\n", env);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!env->thread.task)
+		return 0;
+
+	err = kthread_stop(env->thread.task);
+	if (err == -EINTR) {
+		/*
+		 * Ignore this error.
+		 * The wake_up_process() was never called.
+		 */
+		return 0;
+	} else if (unlikely(err)) {
+		SSDFS_WARN("thread function had some issue: err %d\n",
+			    err);
+		return err;
+	}
+
+	finish_wait(&env->request_wait_queue, &env->thread.wait);
+	env->thread.task = NULL;
+
+	err = SSDFS_WAIT_COMPLETION(&env->thread.full_stop);
+	if (unlikely(err)) {
+		SSDFS_ERR("stop thread fails: err %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH 09/76] ssdfs: internal array/sequence primitives
  2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
                   ` (7 preceding siblings ...)
  2023-02-25  1:08 ` [RFC PATCH 08/76] ssdfs: search last actual superblock Viacheslav Dubeyko
@ 2023-02-25  1:08 ` Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 10/76] ssdfs: introduce PEB's block bitmap Viacheslav Dubeyko
                   ` (67 subsequent siblings)
  76 siblings, 0 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

Page vector implements simple concept of dynamically growing
page set. For example, block bitmap requires 32 memory pages
to represent 2GB erase block. Page vector has simple interface:
(1) create - allocate page vector's metadata limited by capacity
(2) destroy - deallocate page vector's metadata
(3) init/reinit - clean metadata and set count to zero
(4) allocate - allocate memory page and add to the tail of sequence
(5) add - add memory page to the tail of sequence
(6) remove - remove a memory page for requested index
(7) release - free all pages and remove from page vector

Dynamic array implements concept of dynamically growing sequence
of fixed-sized items based on page vector primitive. Dynamic array
has API:
(1) create - create dynamic array for requested capacity and item size
(2) destroy - destroy dynamic array
(3) get_locked - get item locked for index in array
(4) release - release and unlock item for index
(5) set - set item for index
(6) copy_content - copy content of dynamic array in buffer

Sequence array is specialized structure that has goal
to provide access to items via pointers on the basis of
ID numbers. It means that every item has dedicated ID but
sequence array could contain only some portion of existing
items. Initialization phase has goal to add some limited
number of existing items into the sequence array.
The ID number could be reverted from some maximum number
(threshold) to zero value. Sequence array has API:
(1) create - create sequence array
(2) destroy - destroy sequence array
(3) init_item - init item for requested ID
(4) add_item - add item to the tail of sequence
(5) get_item - get pointer on item for requested ID
(6) apply_for_all - apply an action/function for all items
(7) change_state - change item state for requested ID
(8) change_all_state - change state of all items in sequence

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
CC: Viacheslav Dubeyko <viacheslav.dubeyko@bytedance.com>
CC: Luka Perkov <luka.perkov@sartura.hr>
CC: Bruno Banelli <bruno.banelli@sartura.hr>
---
 fs/ssdfs/dynamic_array.c  | 781 ++++++++++++++++++++++++++++++++++++++
 fs/ssdfs/dynamic_array.h  |  96 +++++
 fs/ssdfs/page_vector.c    | 437 +++++++++++++++++++++
 fs/ssdfs/page_vector.h    |  64 ++++
 fs/ssdfs/sequence_array.c | 639 +++++++++++++++++++++++++++++++
 fs/ssdfs/sequence_array.h | 119 ++++++
 6 files changed, 2136 insertions(+)
 create mode 100644 fs/ssdfs/dynamic_array.c
 create mode 100644 fs/ssdfs/dynamic_array.h
 create mode 100644 fs/ssdfs/page_vector.c
 create mode 100644 fs/ssdfs/page_vector.h
 create mode 100644 fs/ssdfs/sequence_array.c
 create mode 100644 fs/ssdfs/sequence_array.h

diff --git a/fs/ssdfs/dynamic_array.c b/fs/ssdfs/dynamic_array.c
new file mode 100644
index 000000000000..ae7e121f61d0
--- /dev/null
+++ b/fs/ssdfs/dynamic_array.c
@@ -0,0 +1,781 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/dymanic_array.c - dynamic array implementation.
+ *
+ * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates.
+ *              https://www.bytedance.com/
+ * Copyright (c) 2022-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cong Wang
+ */
+
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+#include "dynamic_array.h"
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_dynamic_array_page_leaks;
+atomic64_t ssdfs_dynamic_array_memory_leaks;
+atomic64_t ssdfs_dynamic_array_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_dynamic_array_cache_leaks_increment(void *kaddr)
+ * void ssdfs_dynamic_array_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_dynamic_array_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dynamic_array_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dynamic_array_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_dynamic_array_kfree(void *kaddr)
+ * struct page *ssdfs_dynamic_array_alloc_page(gfp_t gfp_mask)
+ * struct page *ssdfs_dynamic_array_add_pagevec_page(struct pagevec *pvec)
+ * void ssdfs_dynamic_array_free_page(struct page *page)
+ * void ssdfs_dynamic_array_pagevec_release(struct pagevec *pvec)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(dynamic_array)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(dynamic_array)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_dynamic_array_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_dynamic_array_page_leaks, 0);
+	atomic64_set(&ssdfs_dynamic_array_memory_leaks, 0);
+	atomic64_set(&ssdfs_dynamic_array_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_dynamic_array_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_dynamic_array_page_leaks) != 0) {
+		SSDFS_ERR("DYNAMIC ARRAY: "
+			  "memory leaks include %lld pages\n",
+			  atomic64_read(&ssdfs_dynamic_array_page_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dynamic_array_memory_leaks) != 0) {
+		SSDFS_ERR("DYNAMIC ARRAY: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dynamic_array_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dynamic_array_cache_leaks) != 0) {
+		SSDFS_ERR("DYNAMIC ARRAY: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dynamic_array_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+/*
+ * ssdfs_dynamic_array_create() - create dynamic array
+ * @array: pointer on dynamic array object
+ * @capacity: maximum number of items in array
+ * @item_size: item size in bytes
+ * @alloc_pattern: pattern to init memory pages
+ */
+int ssdfs_dynamic_array_create(struct ssdfs_dynamic_array *array,
+				u32 capacity, size_t item_size,
+				u8 alloc_pattern)
+{
+	struct page *page;
+	u64 max_threshold = (u64)ssdfs_page_vector_max_threshold() * PAGE_SIZE;
+	u32 pages_count;
+	u64 bytes_count;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+
+	SSDFS_DBG("array %p, capacity %u, item_size %zu\n",
+		  array, capacity, item_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	array->state = SSDFS_DYNAMIC_ARRAY_STORAGE_ABSENT;
+	array->alloc_pattern = alloc_pattern;
+
+	if (capacity == 0) {
+		SSDFS_ERR("invalid capacity %u\n",
+			  capacity);
+		return -EINVAL;
+	}
+
+	if (item_size == 0 || item_size > PAGE_SIZE) {
+		SSDFS_ERR("invalid item_size %zu\n",
+			  item_size);
+		return -EINVAL;
+	}
+
+	array->capacity = capacity;
+	array->item_size = item_size;
+	array->items_per_mem_page = PAGE_SIZE / item_size;
+
+	pages_count = capacity + array->items_per_mem_page - 1;
+	pages_count /= array->items_per_mem_page;
+
+	if (pages_count == 0)
+		pages_count = 1;
+
+	bytes_count = (u64)capacity * item_size;
+
+	if (bytes_count > max_threshold) {
+		SSDFS_ERR("invalid request: "
+			  "bytes_count %llu > max_threshold %llu, "
+			  "capacity %u, item_size %zu\n",
+			  bytes_count, max_threshold,
+			  capacity, item_size);
+		return -EINVAL;
+	}
+
+	if (bytes_count > PAGE_SIZE) {
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(pages_count >= ssdfs_page_vector_max_threshold());
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_page_vector_create(&array->pvec, pages_count);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to create page vector: "
+				  "bytes_count %llu, pages_count %u, "
+				  "err %d\n",
+				  bytes_count, pages_count, err);
+			return err;
+		}
+
+		err = ssdfs_page_vector_init(&array->pvec);
+		if (unlikely(err)) {
+			ssdfs_page_vector_destroy(&array->pvec);
+			SSDFS_ERR("fail to init page vector: "
+				  "bytes_count %llu, pages_count %u, "
+				  "err %d\n",
+				  bytes_count, pages_count, err);
+			return err;
+		}
+
+		page = ssdfs_page_vector_allocate(&array->pvec);
+		if (IS_ERR_OR_NULL(page)) {
+			err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+			SSDFS_ERR("unable to allocate page\n");
+			return err;
+		}
+
+		ssdfs_lock_page(page);
+		ssdfs_memset_page(page, 0, PAGE_SIZE,
+				  array->alloc_pattern, PAGE_SIZE);
+		ssdfs_unlock_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page %p, count %d\n",
+			  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		array->bytes_count = PAGE_SIZE;
+		array->state = SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC;
+	} else {
+		array->buf = ssdfs_dynamic_array_kzalloc(bytes_count,
+							 GFP_KERNEL);
+		if (!array->buf) {
+			SSDFS_ERR("fail to allocate memory: "
+				  "bytes_count %llu\n",
+				  bytes_count);
+			return -ENOMEM;
+		}
+
+		memset(array->buf, array->alloc_pattern, bytes_count);
+
+		array->bytes_count = bytes_count;
+		array->state = SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_dynamic_array_destroy() - destroy dynamic array
+ * @array: pointer on dynamic array object
+ */
+void ssdfs_dynamic_array_destroy(struct ssdfs_dynamic_array *array)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+
+	SSDFS_DBG("array %p, capacity %u, "
+		  "item_size %zu, bytes_count %u\n",
+		  array, array->capacity,
+		  array->item_size, array->bytes_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (array->state) {
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC:
+		ssdfs_page_vector_release(&array->pvec);
+		ssdfs_page_vector_destroy(&array->pvec);
+		break;
+
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER:
+		if (array->buf)
+			ssdfs_dynamic_array_kfree(array->buf);
+		break;
+
+	default:
+		SSDFS_WARN("unexpected state %#x\n", array->state);
+		break;
+	}
+
+	array->capacity = 0;
+	array->item_size = 0;
+	array->items_per_mem_page = 0;
+	array->bytes_count = 0;
+	array->state = SSDFS_DYNAMIC_ARRAY_STORAGE_ABSENT;
+}
+
+/*
+ * ssdfs_dynamic_array_get_locked() - get locked item
+ * @array: pointer on dynamic array object
+ * @index: item index
+ *
+ * This method tries to get pointer on item. If short buffer
+ * (< 4K) represents dynamic array, then the logic is pretty
+ * straitforward. Otherwise, memory page is locked. The release
+ * method should be called to unlock memory page.
+ *
+ * RETURN:
+ * [success] - pointer on requested item.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ * %-E2BIG      - request is out of array capacity.
+ * %-ERANGE     - internal error.
+ */
+void *ssdfs_dynamic_array_get_locked(struct ssdfs_dynamic_array *array,
+				     u32 index)
+{
+	struct page *page;
+	void *ptr = NULL;
+	u64 max_threshold = (u64)ssdfs_page_vector_max_threshold() * PAGE_SIZE;
+	u64 item_offset = 0;
+	u64 page_index;
+	u32 page_off;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+
+	SSDFS_DBG("array %p, index %u, capacity %u, "
+		  "item_size %zu, bytes_count %u\n",
+		  array, index, array->capacity,
+		  array->item_size, array->bytes_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (array->state) {
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC:
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER:
+		/* continue logic */
+		break;
+
+	default:
+		SSDFS_WARN("unexpected state %#x\n", array->state);
+		return ERR_PTR(-ERANGE);
+	}
+
+	if (array->item_size == 0 || array->item_size > PAGE_SIZE) {
+		SSDFS_ERR("invalid item_size %zu\n",
+			  array->item_size);
+		return ERR_PTR(-ERANGE);
+	}
+
+	if (array->capacity == 0) {
+		SSDFS_ERR("invalid capacity %u\n",
+			  array->capacity);
+		return ERR_PTR(-ERANGE);
+	}
+
+	if (array->bytes_count == 0) {
+		SSDFS_ERR("invalid bytes_count %u\n",
+			  array->bytes_count);
+		return ERR_PTR(-ERANGE);
+	}
+
+	if (index >= array->capacity) {
+		SSDFS_WARN("invalid index: index %u, capacity %u\n",
+			   index, array->capacity);
+		return ERR_PTR(-ERANGE);
+	}
+
+	item_offset = (u64)array->item_size * index;
+
+	if (item_offset >= max_threshold) {
+		SSDFS_ERR("invalid item_offset: "
+			  "index %u, item_size %zu, "
+			  "item_offset %llu, bytes_count %u, "
+			  "max_threshold %llu\n",
+			  index, array->item_size,
+			  item_offset, array->bytes_count,
+			  max_threshold);
+		return ERR_PTR(-E2BIG);
+	}
+
+	switch (array->state) {
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC:
+		page_index = index / array->items_per_mem_page;
+		page_off = index % array->items_per_mem_page;
+		page_off *= array->item_size;
+
+		if (page_index >= ssdfs_page_vector_capacity(&array->pvec)) {
+			SSDFS_ERR("invalid page index: "
+				  "page_index %llu, item_offset %llu\n",
+				  page_index, item_offset);
+			return ERR_PTR(-E2BIG);
+		}
+
+		while (page_index >= ssdfs_page_vector_count(&array->pvec)) {
+			page = ssdfs_page_vector_allocate(&array->pvec);
+			if (IS_ERR_OR_NULL(page)) {
+				err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+				SSDFS_ERR("unable to allocate page\n");
+				return ERR_PTR(err);
+			}
+
+			ssdfs_lock_page(page);
+			ssdfs_memset_page(page, 0, PAGE_SIZE,
+					  array->alloc_pattern, PAGE_SIZE);
+			ssdfs_unlock_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("page %p, count %d\n",
+				  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			array->bytes_count += PAGE_SIZE;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("array %p, index %u, capacity %u, "
+					  "item_size %zu, bytes_count %u, "
+					  "index %u, item_offset %llu, "
+					  "page_index %llu, page_count %u\n",
+					  array, index, array->capacity,
+					  array->item_size, array->bytes_count,
+					  index, item_offset, page_index,
+					  ssdfs_page_vector_count(&array->pvec));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		page = array->pvec.pages[page_index];
+
+		ssdfs_lock_page(page);
+		ptr = kmap_local_page(page);
+		ptr = (u8 *)ptr + page_off;
+		break;
+
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER:
+		ptr = (u8 *)array->buf + item_offset;
+		break;
+
+	default:
+		SSDFS_WARN("unexpected state %#x\n", array->state);
+		return ERR_PTR(-ERANGE);
+	}
+
+	return ptr;
+}
+
+/*
+ * ssdfs_dynamic_array_release() - release item
+ * @array: pointer on dynamic array object
+ * @index: item index
+ * @ptr: pointer on item
+ *
+ * This method tries to release item pointer.
+ *
+ * RETURN:
+ * [success] - pointer on requested item.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ * %-E2BIG      - request is out of array capacity.
+ * %-ERANGE     - internal error.
+ */
+int ssdfs_dynamic_array_release(struct ssdfs_dynamic_array *array,
+				u32 index, void *ptr)
+{
+	struct page *page;
+	u64 item_offset = 0;
+	u64 page_index;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array || !ptr);
+
+	SSDFS_DBG("array %p, index %u, capacity %u, "
+		  "item_size %zu, bytes_count %u\n",
+		  array, index, array->capacity,
+		  array->item_size, array->bytes_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (array->state) {
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC:
+		/* continue logic */
+		break;
+
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER:
+		/* do nothing */
+		return 0;
+
+	default:
+		SSDFS_WARN("unexpected state %#x\n", array->state);
+		return -ERANGE;
+	}
+
+	if (array->item_size == 0 || array->item_size > PAGE_SIZE) {
+		SSDFS_ERR("invalid item_size %zu\n",
+			  array->item_size);
+		return -ERANGE;
+	}
+
+	if (array->capacity == 0) {
+		SSDFS_ERR("invalid capacity %u\n",
+			  array->capacity);
+		return -ERANGE;
+	}
+
+	if (array->bytes_count == 0) {
+		SSDFS_ERR("invalid bytes_count %u\n",
+			  array->bytes_count);
+		return -ERANGE;
+	}
+
+	if (index >= array->capacity) {
+		SSDFS_ERR("invalid index: index %u, capacity %u\n",
+			  index, array->capacity);
+		return -ERANGE;
+	}
+
+	item_offset = (u64)array->item_size * index;
+
+	if (item_offset >= array->bytes_count) {
+		SSDFS_ERR("invalid item_offset: "
+			  "index %u, item_size %zu, "
+			  "item_offset %llu, bytes_count %u\n",
+			  index, array->item_size,
+			  item_offset, array->bytes_count);
+		return -E2BIG;
+	}
+
+	page_index = index / array->items_per_mem_page;
+
+	if (page_index >= ssdfs_page_vector_count(&array->pvec)) {
+		SSDFS_ERR("invalid page index: "
+			  "page_index %llu, item_offset %llu\n",
+			  page_index, item_offset);
+		return -E2BIG;
+	}
+
+	page = array->pvec.pages[page_index];
+
+	kunmap_local(ptr);
+	ssdfs_unlock_page(page);
+
+	return 0;
+}
+
+/*
+ * ssdfs_dynamic_array_set() - store item into dynamic array
+ * @array: pointer on dynamic array object
+ * @index: item index
+ * @item: pointer on item
+ *
+ * This method tries to store item into dynamic array.
+ *
+ * RETURN:
+ * [success] - pointer on requested item.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ * %-E2BIG      - request is out of array capacity.
+ * %-ERANGE     - internal error.
+ */
+int ssdfs_dynamic_array_set(struct ssdfs_dynamic_array *array,
+			    u32 index, void *item)
+{
+	struct page *page;
+	void *kaddr = NULL;
+	u64 max_threshold = (u64)ssdfs_page_vector_max_threshold() * PAGE_SIZE;
+	u64 item_offset = 0;
+	u64 page_index;
+	u32 page_off;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array || !item);
+
+	SSDFS_DBG("array %p, index %u, capacity %u, "
+		  "item_size %zu, bytes_count %u\n",
+		  array, index, array->capacity,
+		  array->item_size, array->bytes_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (array->state) {
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC:
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER:
+		/* continue logic */
+		break;
+
+	default:
+		SSDFS_WARN("unexpected state %#x\n", array->state);
+		return -ERANGE;
+	}
+
+	if (array->item_size == 0 || array->item_size > PAGE_SIZE) {
+		SSDFS_ERR("invalid item_size %zu\n",
+			  array->item_size);
+		return -ERANGE;
+	}
+
+	if (array->capacity == 0) {
+		SSDFS_ERR("invalid capacity %u\n",
+			  array->capacity);
+		return -ERANGE;
+	}
+
+	if (array->bytes_count == 0) {
+		SSDFS_ERR("invalid bytes_count %u\n",
+			  array->bytes_count);
+		return -ERANGE;
+	}
+
+	if (index >= array->capacity) {
+		SSDFS_ERR("invalid index: index %u, capacity %u\n",
+			  index, array->capacity);
+		return -ERANGE;
+	}
+
+	item_offset = (u64)array->item_size * index;
+
+	if (item_offset >= max_threshold) {
+		SSDFS_ERR("invalid item_offset: "
+			  "index %u, item_size %zu, "
+			  "item_offset %llu, bytes_count %u, "
+			  "max_threshold %llu\n",
+			  index, array->item_size,
+			  item_offset, array->bytes_count,
+			  max_threshold);
+		return -E2BIG;
+	}
+
+	switch (array->state) {
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC:
+		page_index = index / array->items_per_mem_page;
+		page_off = index % array->items_per_mem_page;;
+		page_off *= array->item_size;
+
+		if (page_index >= ssdfs_page_vector_capacity(&array->pvec)) {
+			SSDFS_ERR("invalid page index: "
+				  "page_index %llu, item_offset %llu\n",
+				  page_index, item_offset);
+			return -E2BIG;
+		}
+
+		while (page_index >= ssdfs_page_vector_count(&array->pvec)) {
+			page = ssdfs_page_vector_allocate(&array->pvec);
+			if (IS_ERR_OR_NULL(page)) {
+				err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+				SSDFS_ERR("unable to allocate page\n");
+				return err;
+			}
+
+			ssdfs_lock_page(page);
+			ssdfs_memset_page(page, 0, PAGE_SIZE,
+					  array->alloc_pattern, PAGE_SIZE);
+			ssdfs_unlock_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("page %p, count %d\n",
+				  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			array->bytes_count += PAGE_SIZE;
+		}
+
+		page = array->pvec.pages[page_index];
+
+		ssdfs_lock_page(page);
+		kaddr = kmap_local_page(page);
+		err = ssdfs_memcpy(kaddr, page_off, PAGE_SIZE,
+				   item, 0, array->item_size,
+				   array->item_size);
+		kunmap_local(kaddr);
+		ssdfs_unlock_page(page);
+		break;
+
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER:
+		err = ssdfs_memcpy(array->buf, item_offset, array->bytes_count,
+				   item, 0, array->item_size,
+				   array->item_size);
+		break;
+
+	default:
+		SSDFS_WARN("unexpected state %#x\n", array->state);
+		return -ERANGE;
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to set item: index %u, err %d\n",
+			  index, err);
+	}
+
+	return err;
+}
+
+/*
+ * ssdfs_dynamic_array_copy_content() - copy the whole dynamic array
+ * @array: pointer on dynamic array object
+ * @copy_buf: pointer on copy buffer
+ * @buf_size: size of the buffer in bytes
+ *
+ * This method tries to copy the whole content of dynamic array.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ * %-ERANGE     - internal error.
+ */
+int ssdfs_dynamic_array_copy_content(struct ssdfs_dynamic_array *array,
+				     void *copy_buf, size_t buf_size)
+{
+	struct page *page;
+	u32 copied_bytes = 0;
+	u32 pages_count;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array || !copy_buf);
+
+	SSDFS_DBG("array %p, capacity %u, "
+		  "item_size %zu, bytes_count %u, "
+		  "copy_buf %p, buf_size %zu\n",
+		  array, array->capacity,
+		  array->item_size, array->bytes_count,
+		  copy_buf, buf_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (array->state) {
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC:
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER:
+		/* continue logic */
+		break;
+
+	default:
+		SSDFS_WARN("unexpected state %#x\n", array->state);
+		return -ERANGE;
+	}
+
+	if (array->bytes_count == 0) {
+		SSDFS_ERR("invalid bytes_count %u\n",
+			  array->bytes_count);
+		return -ERANGE;
+	}
+
+	switch (array->state) {
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC:
+		pages_count = ssdfs_page_vector_count(&array->pvec);
+
+		for (i = 0; i < pages_count; i++) {
+			size_t bytes_count;
+
+			if (copied_bytes >= buf_size) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("stop copy: "
+					  "copied_bytes %u, "
+					  "buf_size %zu, "
+					  "array->bytes_count %u, "
+					  "pages_count %u\n",
+					  copied_bytes,
+					  buf_size,
+					  array->bytes_count,
+					  pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+				break;
+			}
+
+			page = array->pvec.pages[i];
+
+			if (!page) {
+				err = -ERANGE;
+				SSDFS_ERR("fail to copy content: "
+					  "copied_bytes %u, "
+					  "array->bytes_count %u, "
+					  "page_index %d, "
+					  "pages_count %u\n",
+					  copied_bytes,
+					  array->bytes_count,
+					  i, pages_count);
+				goto finish_copy_content;
+			}
+
+			bytes_count =
+				array->item_size * array->items_per_mem_page;
+			bytes_count = min_t(size_t, bytes_count,
+						buf_size - copied_bytes);
+
+			err = ssdfs_memcpy_from_page(copy_buf,
+						     copied_bytes,
+						     buf_size,
+						     page,
+						     0,
+						     PAGE_SIZE,
+						     bytes_count);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to copy content: "
+					  "copied_bytes %u, "
+					  "array->bytes_count %u, "
+					  "page_index %d, "
+					  "pages_count %u, "
+					  "err %d\n",
+					  copied_bytes,
+					  array->bytes_count,
+					  i, pages_count,
+					  err);
+				goto finish_copy_content;
+			}
+
+			copied_bytes += bytes_count;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("array %p, capacity %u, "
+				  "item_size %zu, bytes_count %u, "
+				  "page_index %d, pages_count %u, "
+				  "bytes_count %zu, copied_bytes %u\n",
+				  array, array->capacity,
+				  array->item_size, array->bytes_count,
+				  i, pages_count, bytes_count, copied_bytes);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		break;
+
+	case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER:
+		err = ssdfs_memcpy(copy_buf, 0, buf_size,
+				   array->buf, 0, array->bytes_count,
+				   array->bytes_count);
+		break;
+
+	default:
+		BUG();
+		break;
+	}
+
+finish_copy_content:
+	return err;
+}
diff --git a/fs/ssdfs/dynamic_array.h b/fs/ssdfs/dynamic_array.h
new file mode 100644
index 000000000000..3bb73510f389
--- /dev/null
+++ b/fs/ssdfs/dynamic_array.h
@@ -0,0 +1,96 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/dynamic_array.h - dynamic array's declarations.
+ *
+ * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates.
+ *              https://www.bytedance.com/
+ * Copyright (c) 2022-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cong Wang
+ */
+
+#ifndef _SSDFS_DYNAMIC_ARRAY_H
+#define _SSDFS_DYNAMIC_ARRAY_H
+
+#include "page_vector.h"
+
+/*
+ * struct ssdfs_dynamic_array - dynamic array
+ * @state: array state
+ * @item_size: size of item in bytes
+ * @items_per_mem_page: number of items per memory page
+ * @capacity: maximum available items count
+ * @bytes_count: currently allocated bytes count
+ * @alloc_pattern: pattern to init memory pages
+ * @pvec: vector of pages
+ * @buf: pointer on memory buffer
+ */
+struct ssdfs_dynamic_array {
+	int state;
+	size_t item_size;
+	u32 items_per_mem_page;
+	u32 capacity;
+	u32 bytes_count;
+	u8 alloc_pattern;
+	struct ssdfs_page_vector pvec;
+	void *buf;
+};
+
+/* Dynamic array's states */
+enum {
+	SSDFS_DYNAMIC_ARRAY_STORAGE_ABSENT,
+	SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC,
+	SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER,
+	SSDFS_DYNAMIC_ARRAY_STORAGE_STATE_MAX
+};
+
+/*
+ * Inline functions
+ */
+
+static inline
+u32 ssdfs_dynamic_array_allocated_bytes(struct ssdfs_dynamic_array *array)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return array->bytes_count;
+}
+
+static inline
+u32 ssdfs_dynamic_array_items_count(struct ssdfs_dynamic_array *array)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (array->bytes_count == 0 || array->item_size == 0)
+		return 0;
+
+	return array->bytes_count / array->item_size;
+}
+
+/*
+ * Dynamic array's API
+ */
+int ssdfs_dynamic_array_create(struct ssdfs_dynamic_array *array,
+				u32 capacity, size_t item_size,
+				u8 alloc_pattern);
+void ssdfs_dynamic_array_destroy(struct ssdfs_dynamic_array *array);
+void *ssdfs_dynamic_array_get_locked(struct ssdfs_dynamic_array *array,
+				     u32 index);
+int ssdfs_dynamic_array_release(struct ssdfs_dynamic_array *array,
+				u32 index, void *ptr);
+int ssdfs_dynamic_array_set(struct ssdfs_dynamic_array *array,
+			    u32 index, void *ptr);
+int ssdfs_dynamic_array_copy_content(struct ssdfs_dynamic_array *array,
+				     void *copy_buf, size_t buf_size);
+
+#endif /* _SSDFS_DYNAMIC_ARRAY_H */
diff --git a/fs/ssdfs/page_vector.c b/fs/ssdfs/page_vector.c
new file mode 100644
index 000000000000..b130d99df31b
--- /dev/null
+++ b/fs/ssdfs/page_vector.c
@@ -0,0 +1,437 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/page_vector.c - page vector implementation.
+ *
+ * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates.
+ *              https://www.bytedance.com/
+ * Copyright (c) 2022-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cong Wang
+ */
+
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+#include "page_vector.h"
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_page_vector_page_leaks;
+atomic64_t ssdfs_page_vector_memory_leaks;
+atomic64_t ssdfs_page_vector_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_page_vector_cache_leaks_increment(void *kaddr)
+ * void ssdfs_page_vector_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_page_vector_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_page_vector_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_page_vector_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_page_vector_kfree(void *kaddr)
+ * struct page *ssdfs_page_vector_alloc_page(gfp_t gfp_mask)
+ * struct page *ssdfs_page_vector_add_pagevec_page(struct pagevec *pvec)
+ * void ssdfs_page_vector_free_page(struct page *page)
+ * void ssdfs_page_vector_pagevec_release(struct pagevec *pvec)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(page_vector)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(page_vector)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_page_vector_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_page_vector_page_leaks, 0);
+	atomic64_set(&ssdfs_page_vector_memory_leaks, 0);
+	atomic64_set(&ssdfs_page_vector_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_page_vector_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_page_vector_page_leaks) != 0) {
+		SSDFS_ERR("PAGE VECTOR: "
+			  "memory leaks include %lld pages\n",
+			  atomic64_read(&ssdfs_page_vector_page_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_page_vector_memory_leaks) != 0) {
+		SSDFS_ERR("PAGE VECTOR: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_page_vector_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_page_vector_cache_leaks) != 0) {
+		SSDFS_ERR("PAGE VECTOR: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_page_vector_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+/*
+ * ssdfs_page_vector_create() - create page vector
+ * @array: pointer on page vector
+ * @capacity: max number of memory pages in vector
+ */
+int ssdfs_page_vector_create(struct ssdfs_page_vector *array,
+			     u32 capacity)
+{
+	size_t size = sizeof(struct page *);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	array->count = 0;
+	array->capacity = 0;
+
+	size *= capacity;
+	array->pages = ssdfs_page_vector_kzalloc(size, GFP_KERNEL);
+	if (!array->pages) {
+		SSDFS_ERR("fail to allocate memory: size %zu\n",
+			  size);
+		return -ENOMEM;
+	}
+
+	array->capacity = capacity;
+
+	return 0;
+}
+
+/*
+ * ssdfs_page_vector_destroy() - destroy page vector
+ * @array: pointer on page vector
+ */
+void ssdfs_page_vector_destroy(struct ssdfs_page_vector *array)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	int i;
+
+	BUG_ON(!array);
+
+	if (array->count > 0) {
+		SSDFS_ERR("invalid state: count %u\n",
+			  array->count);
+	}
+
+	for (i = 0; i < array->capacity; i++) {
+		struct page *page = array->pages[i];
+
+		if (page)
+			SSDFS_ERR("page %d is not released\n", i);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	array->count = 0;
+
+	if (array->pages) {
+#ifdef CONFIG_SSDFS_DEBUG
+		if (array->capacity == 0) {
+			SSDFS_ERR("invalid state: capacity %u\n",
+				  array->capacity);
+		}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		array->capacity = 0;
+		ssdfs_page_vector_kfree(array->pages);
+		array->pages = NULL;
+	}
+}
+
+/*
+ * ssdfs_page_vector_init() - init page vector
+ * @array: pointer on page vector
+ */
+int ssdfs_page_vector_init(struct ssdfs_page_vector *array)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+
+	if (!array->pages) {
+		SSDFS_ERR("fail to init\n");
+		return -ERANGE;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	array->count = 0;
+
+	if (array->capacity == 0) {
+		SSDFS_ERR("invalid state: capacity %u\n",
+			  array->capacity);
+		return -ERANGE;
+	} else {
+		memset(array->pages, 0,
+			sizeof(struct page *) * array->capacity);
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_page_vector_reinit() - reinit page vector
+ * @array: pointer on page vector
+ */
+int ssdfs_page_vector_reinit(struct ssdfs_page_vector *array)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	int i;
+
+	BUG_ON(!array);
+
+	if (!array->pages) {
+		SSDFS_ERR("fail to reinit\n");
+		return -ERANGE;
+	}
+
+	for (i = 0; i < array->capacity; i++) {
+		struct page *page = array->pages[i];
+
+		if (page)
+			SSDFS_WARN("page %d is not released\n", i);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	array->count = 0;
+
+	if (array->capacity == 0) {
+		SSDFS_ERR("invalid state: capacity %u\n",
+			  array->capacity);
+		return -ERANGE;
+	} else {
+		memset(array->pages, 0,
+			sizeof(struct page *) * array->capacity);
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_page_vector_count() - count of pages in page vector
+ * @array: pointer on page vector
+ */
+u32 ssdfs_page_vector_count(struct ssdfs_page_vector *array)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return array->count;
+}
+
+/*
+ * ssdfs_page_vector_space() - free space in page vector
+ * @array: pointer on page vector
+ */
+u32 ssdfs_page_vector_space(struct ssdfs_page_vector *array)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+
+	if (array->count > array->capacity) {
+		SSDFS_ERR("count %u is bigger than max %u\n",
+			  array->count, array->capacity);
+		return 0;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return array->capacity - array->count;
+}
+
+/*
+ * ssdfs_page_vector_capacity() - capacity of page vector
+ * @array: pointer on page vector
+ */
+u32 ssdfs_page_vector_capacity(struct ssdfs_page_vector *array)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return array->capacity;
+}
+
+/*
+ * ssdfs_page_vector_add() - add page in page vector
+ * @array: pointer on page vector
+ * @page: memory page
+ */
+int ssdfs_page_vector_add(struct ssdfs_page_vector *array,
+			  struct page *page)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array || !page);
+
+	if (array->count >= array->capacity) {
+		SSDFS_ERR("array is full: count %u\n",
+			  array->count);
+		return -ENOSPC;
+	}
+
+	if (!array->pages) {
+		SSDFS_ERR("fail to add page: "
+			  "count %u, capacity %u\n",
+			  array->count, array->capacity);
+		return -ERANGE;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	array->pages[array->count] = page;
+	array->count++;
+
+	ssdfs_page_vector_account_page(page);
+
+	return 0;
+}
+
+/*
+ * ssdfs_page_vector_allocate() - allocate + add page
+ * @array: pointer on page vector
+ */
+struct page *ssdfs_page_vector_allocate(struct ssdfs_page_vector *array)
+{
+	struct page *page;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (ssdfs_page_vector_space(array) == 0) {
+		SSDFS_ERR("page vector hasn't space\n");
+		return ERR_PTR(-E2BIG);
+	}
+
+	page = ssdfs_page_vector_alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (IS_ERR_OR_NULL(page)) {
+		err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+		SSDFS_ERR("unable to allocate memory page\n");
+		return ERR_PTR(err);
+	}
+
+	/*
+	 * ssdfs_page_vector_add() accounts page
+	 */
+	ssdfs_page_vector_forget_page(page);
+
+	err = ssdfs_page_vector_add(array, page);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to add page: err %d\n",
+			  err);
+		ssdfs_free_page(page);
+		return ERR_PTR(err);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("array %p, page vector count %u\n",
+		  array->pages, ssdfs_page_vector_count(array));
+	SSDFS_DBG("page %p, count %d\n",
+		  page, page_ref_count(page));
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_DBG("page %p, allocated_pages %lld\n",
+		  page, atomic64_read(&ssdfs_page_vector_page_leaks));
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return page;
+}
+
+/*
+ * ssdfs_page_vector_remove() - remove page
+ * @array: pointer on page vector
+ * @page_index: index of the page
+ */
+struct page *ssdfs_page_vector_remove(struct ssdfs_page_vector *array,
+				      u32 page_index)
+{
+	struct page *page;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (ssdfs_page_vector_count(array) == 0) {
+		SSDFS_ERR("page vector is empty\n");
+		return ERR_PTR(-ENODATA);
+	}
+
+	if (array->count > array->capacity) {
+		SSDFS_ERR("page vector is corrupted: "
+			  "array->count %u, array->capacity %u\n",
+			  array->count, array->capacity);
+		return ERR_PTR(-ERANGE);
+	}
+
+	if (page_index >= array->count) {
+		SSDFS_ERR("page index is out of range: "
+			  "page_index %u, array->count %u\n",
+			  page_index, array->count);
+		return ERR_PTR(-ENOENT);
+	}
+
+	page = array->pages[page_index];
+
+	if (!page) {
+		SSDFS_ERR("page index is absent: "
+			  "page_index %u, array->count %u\n",
+			  page_index, array->count);
+		return ERR_PTR(-ENOENT);
+	}
+
+	ssdfs_page_vector_forget_page(page);
+	array->pages[page_index] = NULL;
+
+	return page;
+}
+
+/*
+ * ssdfs_page_vector_release() - release pages from page vector
+ * @array: pointer on page vector
+ */
+void ssdfs_page_vector_release(struct ssdfs_page_vector *array)
+{
+	struct page *page;
+	int i;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+
+	if (!array->pages) {
+		SSDFS_ERR("fail to release: "
+			  "count %u, capacity %u\n",
+			  array->count, array->capacity);
+		return;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < ssdfs_page_vector_count(array); i++) {
+		page = array->pages[i];
+
+		if (!page)
+			continue;
+
+		ssdfs_page_vector_free_page(page);
+		array->pages[i] = NULL;
+
+#ifdef CONFIG_SSDFS_DEBUG
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+		SSDFS_DBG("page %p, allocated_pages %lld\n",
+			  page,
+			  atomic64_read(&ssdfs_page_vector_page_leaks));
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+	ssdfs_page_vector_reinit(array);
+}
diff --git a/fs/ssdfs/page_vector.h b/fs/ssdfs/page_vector.h
new file mode 100644
index 000000000000..4a4a6bcaed32
--- /dev/null
+++ b/fs/ssdfs/page_vector.h
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/page_vector.h - page vector's declarations.
+ *
+ * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates.
+ *              https://www.bytedance.com/
+ * Copyright (c) 2022-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cong Wang
+ */
+
+#ifndef _SSDFS_PAGE_VECTOR_H
+#define _SSDFS_PAGE_VECTOR_H
+
+/*
+ * struct ssdfs_page_vector - vector of memory pages
+ * @count: current number of pages in vector
+ * @capacity: max number of pages in vector
+ * @pages: array of pointers on pages
+ */
+struct ssdfs_page_vector {
+	u32 count;
+	u32 capacity;
+	struct page **pages;
+};
+
+/*
+ * Inline functions
+ */
+
+/*
+ * ssdfs_page_vector_max_threshold() - maximum possible capacity
+ */
+static inline
+u32 ssdfs_page_vector_max_threshold(void)
+{
+	return S32_MAX;
+}
+
+/*
+ * Page vector's API
+ */
+int ssdfs_page_vector_create(struct ssdfs_page_vector *array,
+			     u32 capacity);
+void ssdfs_page_vector_destroy(struct ssdfs_page_vector *array);
+int ssdfs_page_vector_init(struct ssdfs_page_vector *array);
+int ssdfs_page_vector_reinit(struct ssdfs_page_vector *array);
+u32 ssdfs_page_vector_count(struct ssdfs_page_vector *array);
+u32 ssdfs_page_vector_space(struct ssdfs_page_vector *array);
+u32 ssdfs_page_vector_capacity(struct ssdfs_page_vector *array);
+struct page *ssdfs_page_vector_allocate(struct ssdfs_page_vector *array);
+int ssdfs_page_vector_add(struct ssdfs_page_vector *array,
+			  struct page *page);
+struct page *ssdfs_page_vector_remove(struct ssdfs_page_vector *array,
+				      u32 page_index);
+void ssdfs_page_vector_release(struct ssdfs_page_vector *array);
+
+#endif /* _SSDFS_PAGE_VECTOR_H */
diff --git a/fs/ssdfs/sequence_array.c b/fs/ssdfs/sequence_array.c
new file mode 100644
index 000000000000..696fb88ab208
--- /dev/null
+++ b/fs/ssdfs/sequence_array.c
@@ -0,0 +1,639 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/sequence_array.c - sequence array implementation.
+ *
+ * Copyright (c) 2019-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+#include "sequence_array.h"
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_seq_arr_page_leaks;
+atomic64_t ssdfs_seq_arr_memory_leaks;
+atomic64_t ssdfs_seq_arr_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_seq_arr_cache_leaks_increment(void *kaddr)
+ * void ssdfs_seq_arr_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_seq_arr_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_seq_arr_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_seq_arr_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_seq_arr_kfree(void *kaddr)
+ * struct page *ssdfs_seq_arr_alloc_page(gfp_t gfp_mask)
+ * struct page *ssdfs_seq_arr_add_pagevec_page(struct pagevec *pvec)
+ * void ssdfs_seq_arr_free_page(struct page *page)
+ * void ssdfs_seq_arr_pagevec_release(struct pagevec *pvec)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(seq_arr)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(seq_arr)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_seq_arr_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_seq_arr_page_leaks, 0);
+	atomic64_set(&ssdfs_seq_arr_memory_leaks, 0);
+	atomic64_set(&ssdfs_seq_arr_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_seq_arr_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_seq_arr_page_leaks) != 0) {
+		SSDFS_ERR("SEQUENCE ARRAY: "
+			  "memory leaks include %lld pages\n",
+			  atomic64_read(&ssdfs_seq_arr_page_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_seq_arr_memory_leaks) != 0) {
+		SSDFS_ERR("SEQUENCE ARRAY: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_seq_arr_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_seq_arr_cache_leaks) != 0) {
+		SSDFS_ERR("SEQUENCE ARRAY: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_seq_arr_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+/*
+ * ssdfs_create_sequence_array() - create sequence array
+ * @revert_threshold: threshold of rollbacking to zero
+ *
+ * This method tries to allocate memory and to create
+ * the sequence array.
+ *
+ * RETURN:
+ * [success] - pointer on created sequence array
+ * [failure] - error code:
+ *
+ * %-EINVAL  - invalid input.
+ * %-ENOMEM  - fail to allocate memory.
+ */
+struct ssdfs_sequence_array *
+ssdfs_create_sequence_array(unsigned long revert_threshold)
+{
+	struct ssdfs_sequence_array *ptr;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("revert_threshold %lu\n", revert_threshold);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (revert_threshold == 0) {
+		SSDFS_ERR("invalid revert_threshold %lu\n",
+			  revert_threshold);
+		return ERR_PTR(-EINVAL);
+	}
+
+	ptr = ssdfs_seq_arr_kmalloc(sizeof(struct ssdfs_sequence_array),
+				    GFP_KERNEL);
+	if (!ptr) {
+		SSDFS_ERR("fail to allocate memory\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ptr->revert_threshold = revert_threshold;
+	spin_lock_init(&ptr->lock);
+	ptr->last_allocated_id = SSDFS_SEQUENCE_ARRAY_INVALID_ID;
+	INIT_RADIX_TREE(&ptr->map, GFP_ATOMIC);
+
+	return ptr;
+}
+
+/*
+ * ssdfs_destroy_sequence_array() - destroy sequence array
+ * @array: pointer on sequence array object
+ * @free_item: pointer on function that can free item
+ *
+ * This method tries to delete all items from the radix tree,
+ * to free memory of every item and to free the memory of
+ * sequence array itself.
+ */
+void ssdfs_destroy_sequence_array(struct ssdfs_sequence_array *array,
+				  ssdfs_free_item free_item)
+{
+	struct radix_tree_iter iter;
+	void __rcu **slot;
+	void *item_ptr;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array || !free_item);
+
+	SSDFS_DBG("array %p\n", array);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	rcu_read_lock();
+	spin_lock(&array->lock);
+	radix_tree_for_each_slot(slot, &array->map, &iter, 0) {
+		item_ptr = rcu_dereference_raw(*slot);
+
+		spin_unlock(&array->lock);
+		rcu_read_unlock();
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("index %llu, ptr %p\n",
+			  (u64)iter.index, item_ptr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (!item_ptr) {
+			SSDFS_WARN("empty node pointer: "
+				   "index %llu\n",
+				   (u64)iter.index);
+		} else {
+			free_item(item_ptr);
+		}
+
+		rcu_read_lock();
+		spin_lock(&array->lock);
+
+		radix_tree_iter_delete(&array->map, &iter, slot);
+	}
+	array->last_allocated_id = SSDFS_SEQUENCE_ARRAY_INVALID_ID;
+	spin_unlock(&array->lock);
+	rcu_read_unlock();
+
+	ssdfs_seq_arr_kfree(array);
+}
+
+/*
+ * ssdfs_sequence_array_init_item() - initialize the array by item
+ * @array: pointer on sequence array object
+ * @id: ID of inserting item
+ * @item: pointer on inserting item
+ *
+ * This method tries to initialize the array by item.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL  - invalid input.
+ */
+int ssdfs_sequence_array_init_item(struct ssdfs_sequence_array *array,
+				   unsigned long id, void *item)
+{
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array || !item);
+
+	SSDFS_DBG("array %p, id %lu, item %p\n",
+		  array, id, item);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (id > array->revert_threshold) {
+		SSDFS_ERR("invalid input: "
+			  "id %lu, revert_threshold %lu\n",
+			  id, array->revert_threshold);
+		return -EINVAL;
+	}
+
+	err = radix_tree_preload(GFP_NOFS);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to preload radix tree: err %d\n",
+			  err);
+		return err;
+	}
+
+	spin_lock(&array->lock);
+	err = radix_tree_insert(&array->map, id, item);
+	spin_unlock(&array->lock);
+
+	radix_tree_preload_end();
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to add item into radix tree: "
+			  "id %llu, item %p, err %d\n",
+			  (u64)id, item, err);
+		return err;
+	}
+
+	spin_lock(&array->lock);
+	if (array->last_allocated_id == SSDFS_SEQUENCE_ARRAY_INVALID_ID)
+		array->last_allocated_id = id;
+	spin_unlock(&array->lock);
+
+	return 0;
+}
+
+/*
+ * ssdfs_sequence_array_add_item() - add new item into array
+ * @array: pointer on sequence array object
+ * @item: pointer on adding item
+ * @id: pointer on ID value [out]
+ *
+ * This method tries to add a new item into the array.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ERANGE  - internal error.
+ */
+int ssdfs_sequence_array_add_item(struct ssdfs_sequence_array *array,
+				  void *item, unsigned long *id)
+{
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array || !item || !id);
+
+	SSDFS_DBG("array %p, item %p, id %p\n",
+		  array, item, id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*id = SSDFS_SEQUENCE_ARRAY_INVALID_ID;
+
+	err = radix_tree_preload(GFP_NOFS);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to preload radix tree: err %d\n",
+			  err);
+		return err;
+	}
+
+	spin_lock(&array->lock);
+
+	if (array->last_allocated_id == SSDFS_SEQUENCE_ARRAY_INVALID_ID) {
+		err = -ERANGE;
+		goto finish_add_item;
+	} else {
+		if ((array->last_allocated_id + 1) > array->revert_threshold) {
+			*id = 0;
+			array->last_allocated_id = 0;
+		} else {
+			array->last_allocated_id++;
+			*id = array->last_allocated_id;
+		}
+	}
+
+	if (*id > array->revert_threshold) {
+		err = -ERANGE;
+		goto finish_add_item;
+	}
+
+	err = radix_tree_insert(&array->map, *id, item);
+
+finish_add_item:
+	spin_unlock(&array->lock);
+
+	radix_tree_preload_end();
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("id %lu\n", *id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to add item into radix tree: "
+			  "id %llu, last_allocated_id %lu, "
+			  "item %p, err %d\n",
+			  (u64)*id, array->last_allocated_id,
+			  item, err);
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_sequence_array_get_item() - retrieve item from array
+ * @array: pointer on sequence array object
+ * @id: ID value
+ *
+ * This method tries to retrieve the pointer on an item
+ * with @id value.
+ *
+ * RETURN:
+ * [success] - pointer on existing item.
+ * [failure] - error code:
+ *
+ * %-ENOENT  - item is absent.
+ */
+void *ssdfs_sequence_array_get_item(struct ssdfs_sequence_array *array,
+				    unsigned long id)
+{
+	void *item_ptr;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+
+	SSDFS_DBG("array %p, id %lu\n",
+		  array, id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	spin_lock(&array->lock);
+	item_ptr = radix_tree_lookup(&array->map, id);
+	spin_unlock(&array->lock);
+
+	if (!item_ptr) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to find the item: id %llu\n",
+			  (u64)id);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return ERR_PTR(-ENOENT);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("item_ptr %p\n", item_ptr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return item_ptr;
+}
+
+/*
+ * ssdfs_sequence_array_apply_for_all() - apply action for all items
+ * @array: pointer on sequence array object
+ * @apply_action: pointer on method that needs to be applied
+ *
+ * This method tries to apply some action on all items..
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ERANGE  - internal error.
+ */
+int ssdfs_sequence_array_apply_for_all(struct ssdfs_sequence_array *array,
+					ssdfs_apply_action apply_action)
+{
+	struct radix_tree_iter iter;
+	void **slot;
+	void *item_ptr;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array || !apply_action);
+
+	SSDFS_DBG("array %p\n", array);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	rcu_read_lock();
+
+	spin_lock(&array->lock);
+	radix_tree_for_each_slot(slot, &array->map, &iter, 0) {
+		item_ptr = radix_tree_deref_slot(slot);
+		if (unlikely(!item_ptr)) {
+			SSDFS_WARN("empty item ptr: id %llu\n",
+				   (u64)iter.index);
+			continue;
+		}
+		spin_unlock(&array->lock);
+
+		rcu_read_unlock();
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("id %llu, item_ptr %p\n",
+			  (u64)iter.index, item_ptr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = apply_action(item_ptr);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to apply action: "
+				  "id %llu, err %d\n",
+				  (u64)iter.index,  err);
+			goto finish_apply_to_all;
+		}
+
+		rcu_read_lock();
+
+		spin_lock(&array->lock);
+	}
+	spin_unlock(&array->lock);
+
+	rcu_read_unlock();
+
+finish_apply_to_all:
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to apply action for all items: "
+			  "err %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_sequence_array_change_state() - change item's state
+ * @array: pointer on sequence array object
+ * @id: ID value
+ * @old_tag: old tag value
+ * @new_tag: new tag value
+ * @change_state: pointer on method of changing item's state
+ * @old_state: old item's state value
+ * @new_state: new item's state value
+ *
+ * This method tries to change an item's state.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ERANGE  - internal error.
+ * %-ENOENT  - item is absent.
+ */
+int ssdfs_sequence_array_change_state(struct ssdfs_sequence_array *array,
+					unsigned long id,
+					int old_tag, int new_tag,
+					ssdfs_change_item_state change_state,
+					int old_state, int new_state)
+{
+	void *item_ptr = NULL;
+	int res;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array || !change_state);
+
+	SSDFS_DBG("array %p, id %lu, "
+		  "old_tag %#x, new_tag %#x, "
+		  "old_state %#x, new_state %#x\n",
+		  array, id, old_tag, new_tag,
+		  old_state, new_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	rcu_read_lock();
+
+	spin_lock(&array->lock);
+	item_ptr = radix_tree_lookup(&array->map, id);
+	if (item_ptr) {
+		if (old_tag != SSDFS_SEQUENCE_ITEM_NO_TAG) {
+			res = radix_tree_tag_get(&array->map, id, old_tag);
+			if (res != 1)
+				err = -ERANGE;
+		}
+	} else
+		err = -ENOENT;
+	spin_unlock(&array->lock);
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to find item id %llu with tag %#x\n",
+			  (u64)id, old_tag);
+		goto finish_change_state;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("id %llu, item_ptr %p\n",
+		  (u64)id, item_ptr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = change_state(item_ptr, old_state, new_state);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to change state: "
+			  "id %llu, old_state %#x, "
+			  "new_state %#x, err %d\n",
+			  (u64)id, old_state, new_state, err);
+		goto finish_change_state;
+	}
+
+	spin_lock(&array->lock);
+	item_ptr = radix_tree_tag_set(&array->map, id, new_tag);
+	if (old_tag != SSDFS_SEQUENCE_ITEM_NO_TAG)
+		radix_tree_tag_clear(&array->map, id, old_tag);
+	spin_unlock(&array->lock);
+
+finish_change_state:
+	rcu_read_unlock();
+
+	return err;
+}
+
+/*
+ * ssdfs_sequence_array_change_all_states() - change state of all tagged items
+ * @array: pointer on sequence array object
+ * @old_tag: old tag value
+ * @new_tag: new tag value
+ * @change_state: pointer on method of changing item's state
+ * @old_state: old item's state value
+ * @new_state: new item's state value
+ * @found_items: pointer on count of found items [out]
+ *
+ * This method tries to change the state of all tagged items.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ERANGE  - internal error.
+ */
+int ssdfs_sequence_array_change_all_states(struct ssdfs_sequence_array *ptr,
+					   int old_tag, int new_tag,
+					   ssdfs_change_item_state change_state,
+					   int old_state, int new_state,
+					   unsigned long *found_items)
+{
+	struct radix_tree_iter iter;
+	void **slot;
+	void *item_ptr;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!ptr || !change_state || !found_items);
+
+	SSDFS_DBG("array %p, "
+		  "old_tag %#x, new_tag %#x, "
+		  "old_state %#x, new_state %#x\n",
+		  ptr, old_tag, new_tag,
+		  old_state, new_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*found_items = 0;
+
+	rcu_read_lock();
+
+	spin_lock(&ptr->lock);
+	radix_tree_for_each_tagged(slot, &ptr->map, &iter, 0, old_tag) {
+		item_ptr = radix_tree_deref_slot(slot);
+		if (unlikely(!item_ptr)) {
+			SSDFS_WARN("empty item ptr: id %llu\n",
+				   (u64)iter.index);
+			radix_tree_tag_clear(&ptr->map, iter.index, old_tag);
+			continue;
+		}
+		spin_unlock(&ptr->lock);
+
+		rcu_read_unlock();
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("id %llu, item_ptr %p\n",
+			  (u64)iter.index, item_ptr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = change_state(item_ptr, old_state, new_state);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to change state: "
+				  "id %llu, old_state %#x, "
+				  "new_state %#x, err %d\n",
+				  (u64)iter.index, old_state,
+				  new_state, err);
+			goto finish_change_all_states;
+		}
+
+		(*found_items)++;
+
+		rcu_read_lock();
+
+		spin_lock(&ptr->lock);
+		radix_tree_tag_set(&ptr->map, iter.index, new_tag);
+		radix_tree_tag_clear(&ptr->map, iter.index, old_tag);
+	}
+	spin_unlock(&ptr->lock);
+
+	rcu_read_unlock();
+
+finish_change_all_states:
+	if (*found_items == 0 || err) {
+		SSDFS_ERR("fail to change all items' state\n");
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * has_ssdfs_sequence_array_state() - check that any item is tagged
+ * @array: pointer on sequence array object
+ * @tag: checking tag
+ *
+ * This method tries to check that any item is tagged.
+ */
+bool has_ssdfs_sequence_array_state(struct ssdfs_sequence_array *array,
+				    int tag)
+{
+	bool res;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!array);
+
+	SSDFS_DBG("array %p, tag %#x\n", array, tag);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	spin_lock(&array->lock);
+	res = radix_tree_tagged(&array->map, tag);
+	spin_unlock(&array->lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("res %#x\n", res);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return res;
+}
diff --git a/fs/ssdfs/sequence_array.h b/fs/ssdfs/sequence_array.h
new file mode 100644
index 000000000000..9a9c21e30cbe
--- /dev/null
+++ b/fs/ssdfs/sequence_array.h
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/sequence_array.h - sequence array's declarations.
+ *
+ * Copyright (c) 2019-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_SEQUENCE_ARRAY_H
+#define _SSDFS_SEQUENCE_ARRAY_H
+
+#define SSDFS_SEQUENCE_ARRAY_INVALID_ID		ULONG_MAX
+
+#define SSDFS_SEQUENCE_ITEM_NO_TAG		0
+#define SSDFS_SEQUENCE_ITEM_DIRTY_TAG		1
+#define SSDFS_SEQUENCE_ITEM_UNDER_COMMIT_TAG	2
+#define SSDFS_SEQUENCE_ITEM_COMMITED_TAG	3
+
+/*
+ * struct ssdfs_sequence_array - sequence of pointers on items
+ * @revert_threshold: threshold of reverting the ID numbers' sequence
+ * @lock: exclusive lock
+ * @last_allocated_id: the latest ID was allocated
+ * @map: pointers' radix tree
+ *
+ * The sequence array is specialized structure that has goal
+ * to provide access to items via pointers on the basis of
+ * ID numbers. It means that every item has dedicated ID but
+ * sequence array could contain only some portion of existing
+ * items. Initialization phase has goal to add some limited
+ * number of existing items into the sequence array.
+ * The ID number could be reverted from some maximum number
+ * (threshold) to zero value.
+ */
+struct ssdfs_sequence_array {
+	unsigned long revert_threshold;
+
+	spinlock_t lock;
+	unsigned long last_allocated_id;
+	struct radix_tree_root map;
+};
+
+/* function prototype */
+typedef void (*ssdfs_free_item)(void *item);
+typedef int (*ssdfs_apply_action)(void *item);
+typedef int (*ssdfs_change_item_state)(void *item,
+					int old_state,
+					int new_state);
+
+/*
+ * Inline functions
+ */
+static inline
+unsigned long ssdfs_sequence_array_last_id(struct ssdfs_sequence_array *array)
+{
+	unsigned long last_id = ULONG_MAX;
+
+	spin_lock(&array->lock);
+	last_id = array->last_allocated_id;
+	spin_unlock(&array->lock);
+
+	return last_id;
+}
+
+static inline
+void ssdfs_sequence_array_set_last_id(struct ssdfs_sequence_array *array,
+				      unsigned long id)
+{
+	spin_lock(&array->lock);
+	array->last_allocated_id = id;
+	spin_unlock(&array->lock);
+}
+
+static inline
+bool is_ssdfs_sequence_array_last_id_invalid(struct ssdfs_sequence_array *ptr)
+{
+	bool is_invalid = false;
+
+	spin_lock(&ptr->lock);
+	is_invalid = ptr->last_allocated_id == SSDFS_SEQUENCE_ARRAY_INVALID_ID;
+	spin_unlock(&ptr->lock);
+
+	return is_invalid;
+}
+
+/*
+ * Sequence array API
+ */
+struct ssdfs_sequence_array *
+ssdfs_create_sequence_array(unsigned long revert_threshold);
+void ssdfs_destroy_sequence_array(struct ssdfs_sequence_array *array,
+				  ssdfs_free_item free_item);
+int ssdfs_sequence_array_init_item(struct ssdfs_sequence_array *array,
+				   unsigned long id, void *item);
+int ssdfs_sequence_array_add_item(struct ssdfs_sequence_array *array,
+				  void *item, unsigned long *id);
+void *ssdfs_sequence_array_get_item(struct ssdfs_sequence_array *array,
+				    unsigned long id);
+int ssdfs_sequence_array_apply_for_all(struct ssdfs_sequence_array *array,
+					ssdfs_apply_action apply_action);
+int ssdfs_sequence_array_change_state(struct ssdfs_sequence_array *array,
+					unsigned long id,
+					int old_tag, int new_tag,
+					ssdfs_change_item_state change_state,
+					int old_state, int new_state);
+int ssdfs_sequence_array_change_all_states(struct ssdfs_sequence_array *ptr,
+					   int old_tag, int new_tag,
+					   ssdfs_change_item_state change_state,
+					   int old_state, int new_state,
+					   unsigned long *found_items);
+bool has_ssdfs_sequence_array_state(struct ssdfs_sequence_array *array,
+				    int tag);
+
+#endif /* _SSDFS_SEQUENCE_ARRAY_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH 10/76] ssdfs: introduce PEB's block bitmap
  2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
                   ` (8 preceding siblings ...)
  2023-02-25  1:08 ` [RFC PATCH 09/76] ssdfs: internal array/sequence primitives Viacheslav Dubeyko
@ 2023-02-25  1:08 ` Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 11/76] ssdfs: block bitmap search operations implementation Viacheslav Dubeyko
                   ` (66 subsequent siblings)
  76 siblings, 0 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

SSDFS splits a partition/volume on sequence of fixed-sized
segments. Every segment can include one or several Logical
Erase Blocks (LEB). LEB can be mapped into "Physical" Erase
Block (PEB). Generally speaking, PEB is fixed-sized container
that include some number of logical blocks (or NAND flash
pages). PEB has block bitmap with the goal to track the state
(free, pre-allocated, allocated, invalid) of logical blocks
and to account the physical space is used for storing log's
metadata (segment header, partial log header, footer).

Block bitmap implements API:
(1) create - create empty block bitmap
(2) destroy - destroy block bitmap object
(3) init - intialize block bitmap by metadata from PEB's log
(4) snapshot - take block bitmap snapshot for flush operation
(5) forget_snapshot - free block bitmap's snapshot resources
(6) lock/unlock - lock/unlock block bitmap
(7) test_block/test_range - check state of block or range of blocks
(8) get_free_pages - get number of free pages
(9) get_used_pages - get number of used pages
(10) get_invalid_pages - get number of invalid pages
(11) pre_allocate - pre_allocate logical block or range of blocks
(12) allocate - allocate logical block or range of blocks
(13) invalidate - invalidate logical block or range of blocks
(14) collect_garbage - get contigous range of blocks in state
(15) clean - convert the whole block bitmap into clean state

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
CC: Viacheslav Dubeyko <viacheslav.dubeyko@bytedance.com>
CC: Luka Perkov <luka.perkov@sartura.hr>
CC: Bruno Banelli <bruno.banelli@sartura.hr>
---
 fs/ssdfs/block_bitmap.c        | 1209 ++++++++++++++++++++++++++++++++
 fs/ssdfs/block_bitmap.h        |  370 ++++++++++
 fs/ssdfs/block_bitmap_tables.c |  310 ++++++++
 3 files changed, 1889 insertions(+)
 create mode 100644 fs/ssdfs/block_bitmap.c
 create mode 100644 fs/ssdfs/block_bitmap.h
 create mode 100644 fs/ssdfs/block_bitmap_tables.c

diff --git a/fs/ssdfs/block_bitmap.c b/fs/ssdfs/block_bitmap.c
new file mode 100644
index 000000000000..fd7e84258cf0
--- /dev/null
+++ b/fs/ssdfs/block_bitmap.c
@@ -0,0 +1,1209 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/block_bitmap.c - PEB's block bitmap implementation.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates.
+ *              https://www.bytedance.com/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ *                  Cong Wang
+ */
+
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+#include "page_vector.h"
+#include "block_bitmap.h"
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_block_bmap_page_leaks;
+atomic64_t ssdfs_block_bmap_memory_leaks;
+atomic64_t ssdfs_block_bmap_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_block_bmap_cache_leaks_increment(void *kaddr)
+ * void ssdfs_block_bmap_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_block_bmap_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_block_bmap_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_block_bmap_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_block_bmap_kfree(void *kaddr)
+ * struct page *ssdfs_block_bmap_alloc_page(gfp_t gfp_mask)
+ * struct page *ssdfs_block_bmap_add_pagevec_page(struct pagevec *pvec)
+ * void ssdfs_block_bmap_free_page(struct page *page)
+ * void ssdfs_block_bmap_pagevec_release(struct pagevec *pvec)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(block_bmap)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(block_bmap)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_block_bmap_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_block_bmap_page_leaks, 0);
+	atomic64_set(&ssdfs_block_bmap_memory_leaks, 0);
+	atomic64_set(&ssdfs_block_bmap_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_block_bmap_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_block_bmap_page_leaks) != 0) {
+		SSDFS_ERR("BLOCK BMAP: "
+			  "memory leaks include %lld pages\n",
+			  atomic64_read(&ssdfs_block_bmap_page_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_block_bmap_memory_leaks) != 0) {
+		SSDFS_ERR("BLOCK BMAP: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_block_bmap_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_block_bmap_cache_leaks) != 0) {
+		SSDFS_ERR("BLOCK BMAP: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_block_bmap_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+extern const bool detect_free_blk[U8_MAX + 1];
+extern const bool detect_pre_allocated_blk[U8_MAX + 1];
+extern const bool detect_valid_blk[U8_MAX + 1];
+extern const bool detect_invalid_blk[U8_MAX + 1];
+
+#define ALIGNED_START_BLK(blk) ({ \
+	u32 aligned_blk; \
+	aligned_blk = (blk >> SSDFS_BLK_STATE_BITS) << SSDFS_BLK_STATE_BITS; \
+	aligned_blk; \
+})
+
+#define ALIGNED_END_BLK(blk) ({ \
+	u32 aligned_blk; \
+	aligned_blk = blk + SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS) - 1; \
+	aligned_blk >>= SSDFS_BLK_STATE_BITS; \
+	aligned_blk <<= SSDFS_BLK_STATE_BITS; \
+	aligned_blk; \
+})
+
+#define SSDFS_BLK_BMAP_STATE_FLAGS_FNS(state, name)			\
+static inline								\
+bool is_block_bmap_##name(struct ssdfs_block_bmap *blk_bmap)		\
+{									\
+	return atomic_read(&blk_bmap->flags) & SSDFS_BLK_BMAP_##state;	\
+}									\
+static inline								\
+void set_block_bmap_##name(struct ssdfs_block_bmap *blk_bmap)		\
+{									\
+	atomic_or(SSDFS_BLK_BMAP_##state, &blk_bmap->flags);		\
+}									\
+static inline								\
+void clear_block_bmap_##name(struct ssdfs_block_bmap *blk_bmap)		\
+{									\
+	atomic_and(~SSDFS_BLK_BMAP_##state, &blk_bmap->flags);		\
+}									\
+
+/*
+ * is_block_bmap_initialized()
+ * set_block_bmap_initialized()
+ * clear_block_bmap_initialized()
+ */
+SSDFS_BLK_BMAP_STATE_FLAGS_FNS(INITIALIZED, initialized)
+
+/*
+ * is_block_bmap_dirty()
+ * set_block_bmap_dirty()
+ * clear_block_bmap_dirty()
+ */
+SSDFS_BLK_BMAP_STATE_FLAGS_FNS(DIRTY, dirty)
+
+static
+int ssdfs_cache_block_state(struct ssdfs_block_bmap *blk_bmap,
+			    u32 blk, int blk_state);
+
+bool ssdfs_block_bmap_dirtied(struct ssdfs_block_bmap *blk_bmap)
+{
+	return is_block_bmap_dirty(blk_bmap);
+}
+
+bool ssdfs_block_bmap_initialized(struct ssdfs_block_bmap *blk_bmap)
+{
+	return is_block_bmap_initialized(blk_bmap);
+}
+
+void ssdfs_set_block_bmap_initialized(struct ssdfs_block_bmap *blk_bmap)
+{
+	set_block_bmap_initialized(blk_bmap);
+}
+
+void ssdfs_block_bmap_clear_dirty_state(struct ssdfs_block_bmap *blk_bmap)
+{
+	SSDFS_DBG("clear dirty state\n");
+	clear_block_bmap_dirty(blk_bmap);
+}
+
+static inline
+bool is_cache_invalid(struct ssdfs_block_bmap *blk_bmap, int blk_state);
+static
+int ssdfs_set_range_in_storage(struct ssdfs_block_bmap *blk_bmap,
+				struct ssdfs_block_bmap_range *range,
+				int blk_state);
+static
+int ssdfs_block_bmap_find_block_in_cache(struct ssdfs_block_bmap *blk_bmap,
+					 u32 start, u32 max_blk,
+					 int blk_state, u32 *found_blk);
+static
+int ssdfs_block_bmap_find_block(struct ssdfs_block_bmap *blk_bmap,
+				u32 start, u32 max_blk, int blk_state,
+				u32 *found_blk);
+
+#ifdef CONFIG_SSDFS_DEBUG
+static
+void ssdfs_debug_block_bitmap(struct ssdfs_block_bmap *bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+/*
+ * ssdfs_block_bmap_storage_destroy() - destroy block bitmap's storage
+ * @storage: pointer on block bitmap's storage
+ */
+static
+void ssdfs_block_bmap_storage_destroy(struct ssdfs_block_bmap_storage *storage)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!storage);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (storage->state) {
+	case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC:
+		ssdfs_page_vector_release(&storage->array);
+		ssdfs_page_vector_destroy(&storage->array);
+		break;
+
+	case SSDFS_BLOCK_BMAP_STORAGE_BUFFER:
+		if (storage->buf)
+			ssdfs_block_bmap_kfree(storage->buf);
+		break;
+
+	default:
+		SSDFS_WARN("unexpected state %#x\n", storage->state);
+		break;
+	}
+
+	storage->state = SSDFS_BLOCK_BMAP_STORAGE_ABSENT;
+}
+
+/*
+ * ssdfs_block_bmap_destroy() - destroy PEB's block bitmap
+ * @blk_bmap: pointer on block bitmap
+ *
+ * This function releases memory pages of pagevec and
+ * to free memory of ssdfs_block_bmap structure.
+ */
+void ssdfs_block_bmap_destroy(struct ssdfs_block_bmap *blk_bmap)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	SSDFS_DBG("blk_bmap %p, items count %zu, "
+		  "bmap bytes %zu\n",
+		  blk_bmap, blk_bmap->items_count,
+		  blk_bmap->bytes_count);
+
+	if (mutex_is_locked(&blk_bmap->lock))
+		SSDFS_WARN("block bitmap's mutex is locked\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_block_bmap_initialized(blk_bmap))
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+
+	if (is_block_bmap_dirty(blk_bmap))
+		SSDFS_WARN("block bitmap is dirty\n");
+
+	ssdfs_block_bmap_storage_destroy(&blk_bmap->storage);
+}
+
+/*
+ * ssdfs_block_bmap_create_empty_storage() - create block bitmap's storage
+ * @storage: pointer on block bitmap's storage
+ * @bmap_bytes: number of bytes in block bitmap
+ */
+static
+int ssdfs_block_bmap_create_empty_storage(struct ssdfs_block_bmap_storage *ptr,
+					  size_t bmap_bytes)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!ptr);
+
+	SSDFS_DBG("storage %p, bmap_bytes %zu\n",
+		  ptr, bmap_bytes);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ptr->state = SSDFS_BLOCK_BMAP_STORAGE_ABSENT;
+
+	if (bmap_bytes > PAGE_SIZE) {
+		size_t capacity = (bmap_bytes + PAGE_SIZE - 1) / PAGE_SIZE;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(capacity >= U8_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_page_vector_create(&ptr->array, (u8)capacity);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to create page vector: "
+				  "bmap_bytes %zu, capacity %zu, err %d\n",
+				  bmap_bytes, capacity, err);
+			return err;
+		}
+
+		err = ssdfs_page_vector_init(&ptr->array);
+		if (unlikely(err)) {
+			ssdfs_page_vector_destroy(&ptr->array);
+			SSDFS_ERR("fail to init page vector: "
+				  "bmap_bytes %zu, capacity %zu, err %d\n",
+				  bmap_bytes, capacity, err);
+			return err;
+		}
+
+		ptr->state = SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC;
+	} else {
+		ptr->buf = ssdfs_block_bmap_kmalloc(bmap_bytes, GFP_KERNEL);
+		if (!ptr->buf) {
+			SSDFS_ERR("fail to allocate memory: "
+				  "bmap_bytes %zu\n",
+				  bmap_bytes);
+			return -ENOMEM;
+		}
+
+		ptr->state = SSDFS_BLOCK_BMAP_STORAGE_BUFFER;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_init_clean_storage() - init clean block bitmap
+ * @ptr: pointer on block bitmap object
+ * @bmap_pages: memory pages count in block bitmap
+ *
+ * This function initializes storage space of the clean
+ * block bitmap.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ENOMEM     - unable to allocate memory.
+ * %-ERANGE     - internal error.
+ */
+static
+int ssdfs_block_bmap_init_clean_storage(struct ssdfs_block_bmap *ptr,
+					size_t bmap_pages)
+{
+	struct ssdfs_page_vector *array;
+	struct page *page;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!ptr);
+
+	SSDFS_DBG("bmap %p, storage_state %#x, "
+		  "bmap_bytes %zu, bmap_pages %zu\n",
+		  ptr, ptr->storage.state,
+		  ptr->bytes_count, bmap_pages);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (ptr->storage.state) {
+	case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC:
+		array = &ptr->storage.array;
+
+		if (ssdfs_page_vector_space(array) < bmap_pages) {
+			SSDFS_ERR("page vector capacity is not enough: "
+				  "capacity %u, free_space %u, "
+				  "bmap_pages %zu\n",
+				  ssdfs_page_vector_capacity(array),
+				  ssdfs_page_vector_space(array),
+				  bmap_pages);
+			return -ENOMEM;
+		}
+
+		page = ssdfs_page_vector_allocate(array);
+		if (IS_ERR_OR_NULL(page)) {
+			err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+			SSDFS_ERR("unable to allocate #%d page\n", i);
+			return err;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page %p, count %d\n",
+			  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+		break;
+
+	case SSDFS_BLOCK_BMAP_STORAGE_BUFFER:
+		memset(ptr->storage.buf, 0, ptr->bytes_count);
+		break;
+
+	default:
+		SSDFS_ERR("unexpected state %#x\n", ptr->storage.state);
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_create() - construct PEB's block bitmap
+ * @fsi: file system info object
+ * @ptr: pointer on block bitmap object
+ * @items_count: count of described items
+ * @flag: define necessity to allocate memory
+ * @init_state: block state is used during initialization
+ *
+ * This function prepares page vector and
+ * makes initialization of ssdfs_block_bmap structure.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EOPNOTSUPP - pagevec is too small for block bitmap
+ *                representation.
+ * %-ENOMEM     - unable to allocate memory.
+ */
+int ssdfs_block_bmap_create(struct ssdfs_fs_info *fsi,
+			    struct ssdfs_block_bmap *ptr,
+			    u32 items_count,
+			    int flag, int init_state)
+{
+	int max_capacity = SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX;
+	size_t bmap_bytes = 0;
+	size_t bmap_pages = 0;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !ptr);
+
+	if (init_state > SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", init_state);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("fsi %p, pagesize %u, segsize %u, pages_per_seg %u, "
+		  "items_count %u, flag %#x, init_state %#x\n",
+		  fsi, fsi->pagesize, fsi->segsize, fsi->pages_per_seg,
+		  items_count, flag, init_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	bmap_bytes = BLK_BMAP_BYTES(items_count);
+	bmap_pages = (bmap_bytes + PAGE_SIZE - 1) / PAGE_SIZE;
+
+	if (bmap_pages > max_capacity) {
+		SSDFS_WARN("unable to allocate bmap with %zu pages\n",
+			    bmap_pages);
+		return -EOPNOTSUPP;
+	}
+
+	mutex_init(&ptr->lock);
+	atomic_set(&ptr->flags, 0);
+	ptr->bytes_count = bmap_bytes;
+	ptr->items_count = items_count;
+	ptr->metadata_items = 0;
+	ptr->used_blks = 0;
+	ptr->invalid_blks = 0;
+
+	err = ssdfs_block_bmap_create_empty_storage(&ptr->storage, bmap_bytes);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to create empty bmap's storage: "
+			  "bmap_bytes %zu, err %d\n",
+			  bmap_bytes, err);
+		return err;
+	}
+
+	for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) {
+		ptr->last_search[i].page_index = max_capacity;
+		ptr->last_search[i].offset = U16_MAX;
+	}
+
+	if (flag == SSDFS_BLK_BMAP_INIT)
+		goto alloc_end;
+
+	err = ssdfs_block_bmap_init_clean_storage(ptr, bmap_pages);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to init clean bmap's storage: "
+			  "bmap_bytes %zu, bmap_pages %zu, err %d\n",
+			  bmap_bytes, bmap_pages, err);
+		goto destroy_pagevec;
+	}
+
+	if (init_state != SSDFS_BLK_FREE) {
+		struct ssdfs_block_bmap_range range = {0, ptr->items_count};
+
+		err = ssdfs_set_range_in_storage(ptr, &range, init_state);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to initialize block bmap: "
+				  "range (start %u, len %u), "
+				  "init_state %#x, err %d\n",
+				  range.start, range.len, init_state, err);
+			goto destroy_pagevec;
+		}
+	}
+
+	err = ssdfs_cache_block_state(ptr, 0, SSDFS_BLK_FREE);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to cache last free page: err %d\n",
+			  err);
+		goto destroy_pagevec;
+	}
+
+	set_block_bmap_initialized(ptr);
+
+alloc_end:
+	return 0;
+
+destroy_pagevec:
+	ssdfs_block_bmap_destroy(ptr);
+	return err;
+}
+
+/*
+ * ssdfs_block_bmap_init_storage() - initialize block bitmap storage
+ * @blk_bmap: pointer on block bitmap
+ * @source: prepared pagevec after reading from volume
+ *
+ * This function initializes block bitmap's storage on
+ * the basis of pages @source are read from volume.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ * %-ERANGE     - internal error.
+ * %-ENOMEM     - fail to allocate memory.
+ */
+static
+int ssdfs_block_bmap_init_storage(struct ssdfs_block_bmap *blk_bmap,
+				  struct ssdfs_page_vector *source)
+{
+	struct ssdfs_page_vector *array;
+	struct page *page;
+#ifdef CONFIG_SSDFS_DEBUG
+	void *kaddr;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !source);
+
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("bmap %p, bmap_bytes %zu\n",
+		  blk_bmap, blk_bmap->bytes_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	array = &blk_bmap->storage.array;
+
+	if (blk_bmap->storage.state != SSDFS_BLOCK_BMAP_STORAGE_ABSENT) {
+		switch (blk_bmap->storage.state) {
+		case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC:
+			ssdfs_page_vector_release(array);
+			break;
+
+		case SSDFS_BLOCK_BMAP_STORAGE_BUFFER:
+			/* Do nothing. We have buffer already */
+			break;
+
+		default:
+			BUG();
+		}
+	} else {
+		err = ssdfs_block_bmap_create_empty_storage(&blk_bmap->storage,
+							blk_bmap->bytes_count);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to create empty bmap's storage: "
+				  "err %d\n", err);
+			return err;
+		}
+	}
+
+	switch (blk_bmap->storage.state) {
+	case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC:
+		for (i = 0; i < ssdfs_page_vector_count(source); i++) {
+			page = ssdfs_page_vector_remove(source, i);
+			if (IS_ERR_OR_NULL(page)) {
+				SSDFS_WARN("page %d is NULL\n", i);
+				return -ERANGE;
+			}
+
+			ssdfs_lock_page(page);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			kaddr = kmap_local_page(page);
+			SSDFS_DBG("BMAP INIT\n");
+			print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+					     kaddr, 32);
+			kunmap_local(kaddr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			err = ssdfs_page_vector_add(array, page);
+			ssdfs_unlock_page(page);
+
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to add page: "
+					  "page_index %d, err %d\n",
+					  i, err);
+				return err;
+			}
+		}
+
+		err = ssdfs_page_vector_reinit(source);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to reinit page vector: "
+				  "err %d\n", err);
+			return err;
+		}
+		break;
+
+	case SSDFS_BLOCK_BMAP_STORAGE_BUFFER:
+		if (ssdfs_page_vector_count(source)  > 1) {
+			SSDFS_ERR("invalid source pvec size %u\n",
+				  ssdfs_page_vector_count(source));
+			return -ERANGE;
+		}
+
+		page = ssdfs_page_vector_remove(source, 0);
+
+		if (!page) {
+			SSDFS_WARN("page %d is NULL\n", 0);
+			return -ERANGE;
+		}
+
+		ssdfs_lock_page(page);
+
+		ssdfs_memcpy_from_page(blk_bmap->storage.buf,
+				       0, blk_bmap->bytes_count,
+				       page, 0, PAGE_SIZE,
+				       blk_bmap->bytes_count);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		kaddr = kmap_local_page(page);
+		SSDFS_DBG("BMAP INIT\n");
+		print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+				     kaddr, 32);
+		kunmap_local(kaddr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_unlock_page(page);
+
+		ssdfs_block_bmap_account_page(page);
+		ssdfs_block_bmap_free_page(page);
+
+		ssdfs_page_vector_release(source);
+		break;
+
+	default:
+		SSDFS_ERR("unexpected state %#x\n",
+			  blk_bmap->storage.state);
+		return -ERANGE;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("pvec %p, pagevec count %u\n",
+		  source, ssdfs_page_vector_count(source));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+static
+int ssdfs_block_bmap_find_range(struct ssdfs_block_bmap *blk_bmap,
+				u32 start, u32 len, u32 max_blk,
+				int blk_state,
+				struct ssdfs_block_bmap_range *range);
+
+/*
+ * ssdfs_block_bmap_init() - initialize block bitmap pagevec
+ * @blk_bmap: pointer on block bitmap
+ * @source: prepared pagevec after reading from volume
+ * @last_free_blk: saved on volume last free page
+ * @metadata_blks: saved on volume reserved metadata blocks count
+ * @invalid_blks: saved on volume count of invalid blocks
+ *
+ * This function initializes block bitmap's pagevec on
+ * the basis of pages @source are read from volume.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ */
+int ssdfs_block_bmap_init(struct ssdfs_block_bmap *blk_bmap,
+			  struct ssdfs_page_vector *source,
+			  u32 last_free_blk,
+			  u32 metadata_blks,
+			  u32 invalid_blks)
+{
+	struct ssdfs_block_bmap_range found;
+	int max_capacity = SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX;
+	u32 start_item;
+	int blk_state;
+	int free_pages;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !source);
+
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, source %p, "
+		  "last_free_blk %u, metadata_blks %u, invalid_blks %u\n",
+		  blk_bmap, source,
+		  last_free_blk, metadata_blks, invalid_blks);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (is_block_bmap_initialized(blk_bmap)) {
+		if (is_block_bmap_dirty(blk_bmap)) {
+			SSDFS_WARN("block bitmap has been initialized\n");
+			return -ERANGE;
+		}
+
+		free_pages = ssdfs_block_bmap_get_free_pages(blk_bmap);
+		if (unlikely(free_pages < 0)) {
+			err = free_pages;
+			SSDFS_ERR("fail to define free pages: err %d\n",
+				  err);
+			return err;
+		}
+
+		if (free_pages != blk_bmap->items_count) {
+			SSDFS_WARN("block bitmap has been initialized\n");
+			return -ERANGE;
+		}
+
+		for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) {
+			blk_bmap->last_search[i].page_index = max_capacity;
+			blk_bmap->last_search[i].offset = U16_MAX;
+		}
+
+		ssdfs_block_bmap_storage_destroy(&blk_bmap->storage);
+		clear_block_bmap_initialized(blk_bmap);
+	}
+
+	if (ssdfs_page_vector_count(source) == 0) {
+		SSDFS_ERR("fail to init because of empty pagevec\n");
+		return -EINVAL;
+	}
+
+	if (last_free_blk > blk_bmap->items_count) {
+		SSDFS_ERR("invalid values: "
+			  "last_free_blk %u, items_count %zu\n",
+			  last_free_blk, blk_bmap->items_count);
+		return -EINVAL;
+	}
+
+	if (metadata_blks > blk_bmap->items_count) {
+		SSDFS_ERR("invalid values: "
+			  "metadata_blks %u, items_count %zu\n",
+			  metadata_blks, blk_bmap->items_count);
+		return -EINVAL;
+	}
+
+	blk_bmap->metadata_items = metadata_blks;
+
+	if (invalid_blks > blk_bmap->items_count) {
+		SSDFS_ERR("invalid values: "
+			  "invalid_blks %u, last_free_blk %u, "
+			  "items_count %zu\n",
+			  invalid_blks, last_free_blk,
+			  blk_bmap->items_count);
+		return -EINVAL;
+	}
+
+	blk_bmap->invalid_blks = invalid_blks;
+
+	err = ssdfs_block_bmap_init_storage(blk_bmap, source);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to init bmap's storage: err %d\n",
+			  err);
+		return err;
+	}
+
+	err = ssdfs_cache_block_state(blk_bmap, last_free_blk, SSDFS_BLK_FREE);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to cache last free page %u, err %d\n",
+			  last_free_blk, err);
+		return err;
+	}
+
+	blk_bmap->used_blks = 0;
+
+	start_item = 0;
+	blk_state = SSDFS_BLK_VALID;
+
+	do {
+		err = ssdfs_block_bmap_find_range(blk_bmap,
+					start_item,
+					blk_bmap->items_count - start_item,
+					blk_bmap->items_count,
+					blk_state, &found);
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find more valid blocks: "
+				  "start_item %u\n",
+				  start_item);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto check_pre_allocated_blocks;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find range: err %d\n", err);
+			return err;
+		}
+
+		blk_bmap->used_blks += found.len;
+		start_item = found.start + found.len;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("VALID_BLK: range (start %u, len %u)\n",
+			  found.start, found.len);
+#endif /* CONFIG_SSDFS_DEBUG */
+	} while (start_item < blk_bmap->items_count);
+
+check_pre_allocated_blocks:
+	start_item = 0;
+	blk_state = SSDFS_BLK_PRE_ALLOCATED;
+
+	do {
+		err = ssdfs_block_bmap_find_range(blk_bmap,
+					start_item,
+					blk_bmap->items_count - start_item,
+					blk_bmap->items_count,
+					blk_state, &found);
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find more pre-allocated blocks: "
+				  "start_item %u\n",
+				  start_item);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto finish_block_bmap_init;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find range: err %d\n", err);
+			return err;
+		}
+
+		blk_bmap->used_blks += found.len;
+		start_item = found.start + found.len;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("PRE_ALLOCATED_BLK: range (start %u, len %u)\n",
+			  found.start, found.len);
+#endif /* CONFIG_SSDFS_DEBUG */
+	} while (start_item < blk_bmap->items_count);
+
+finish_block_bmap_init:
+	set_block_bmap_initialized(blk_bmap);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	ssdfs_debug_block_bitmap(blk_bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_define_last_free_page() - define last free page
+ * @blk_bmap: pointer on block bitmap
+ * @found_blk: found last free page [out]
+ */
+static
+int ssdfs_define_last_free_page(struct ssdfs_block_bmap *blk_bmap,
+				u32 *found_blk)
+{
+	int cache_type;
+	struct ssdfs_last_bmap_search *last_search;
+	u32 first_cached_blk;
+	u32 max_blk;
+	u32 items_per_long = SSDFS_ITEMS_PER_LONG(SSDFS_BLK_STATE_BITS);
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("blk_bmap %p, found_blk %p\n",
+		  blk_bmap, found_blk);
+
+	BUG_ON(!blk_bmap || !found_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	cache_type = SSDFS_GET_CACHE_TYPE(SSDFS_BLK_FREE);
+	max_blk = blk_bmap->items_count;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(cache_type >= SSDFS_SEARCH_TYPE_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (is_cache_invalid(blk_bmap, SSDFS_BLK_FREE)) {
+		err = ssdfs_block_bmap_find_block(blk_bmap,
+						  0, max_blk,
+						  SSDFS_BLK_FREE,
+						  found_blk);
+		if (err == -ENODATA) {
+			*found_blk = blk_bmap->items_count;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find last free block: "
+				  "found_blk %u\n",
+				  *found_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto finish_define_last_free_page;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find last free block: err %d\n",
+				  err);
+			return err;
+		}
+	} else {
+		last_search = &blk_bmap->last_search[cache_type];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("last_search.cache %lx\n", last_search->cache);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		first_cached_blk = SSDFS_FIRST_CACHED_BLOCK(last_search);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("first_cached_blk %u\n",
+			  first_cached_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_block_bmap_find_block_in_cache(blk_bmap,
+							   first_cached_blk,
+							   max_blk,
+							   SSDFS_BLK_FREE,
+							   found_blk);
+		if (err == -ENODATA) {
+			first_cached_blk += items_per_long;
+			err = ssdfs_block_bmap_find_block(blk_bmap,
+							  first_cached_blk,
+							  max_blk,
+							  SSDFS_BLK_FREE,
+							  found_blk);
+			if (err == -ENODATA) {
+				*found_blk = blk_bmap->items_count;
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("unable to find last free block: "
+					  "found_blk %u\n",
+					  *found_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+				goto finish_define_last_free_page;
+			} else if (unlikely(err)) {
+				SSDFS_ERR("fail to find last free block: err %d\n",
+					  err);
+				return err;
+			}
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find last free block: err %d\n",
+				  err);
+			return err;
+		}
+	}
+
+finish_define_last_free_page:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("last free block: %u\n", *found_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_snapshot_storage() - make snapshot of bmap's storage
+ * @blk_bmap: pointer on block bitmap
+ * @snapshot: pagevec with snapshot of block bitmap state [out]
+ *
+ * This function copies pages of block bitmap's styorage into
+ * @snapshot pagevec.
+ *
+ * RETURN:
+ * [success] - @snapshot contains copy of block bitmap's state
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ * %-ENOMEM     - unable to allocate memory.
+ */
+static
+int ssdfs_block_bmap_snapshot_storage(struct ssdfs_block_bmap *blk_bmap,
+					struct ssdfs_page_vector *snapshot)
+{
+	struct ssdfs_page_vector *array;
+	struct page *page;
+#ifdef CONFIG_SSDFS_DEBUG
+	void *kaddr;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !snapshot);
+	BUG_ON(ssdfs_page_vector_count(snapshot) != 0);
+
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap's mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, snapshot %p\n",
+		  blk_bmap, snapshot);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (blk_bmap->storage.state) {
+	case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC:
+		array = &blk_bmap->storage.array;
+
+		for (i = 0; i < ssdfs_page_vector_count(array); i++) {
+			page = ssdfs_block_bmap_alloc_page(GFP_KERNEL);
+			if (IS_ERR_OR_NULL(page)) {
+				err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+				SSDFS_ERR("unable to allocate #%d page\n", i);
+				return err;
+			}
+
+			ssdfs_memcpy_page(page, 0, PAGE_SIZE,
+					  array->pages[i], 0, PAGE_SIZE,
+					  PAGE_SIZE);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			kaddr = kmap_local_page(page);
+			SSDFS_DBG("BMAP SNAPSHOT\n");
+			print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+					     kaddr, 32);
+			kunmap_local(kaddr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			ssdfs_block_bmap_forget_page(page);
+			err = ssdfs_page_vector_add(snapshot, page);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to add page: "
+					  "index %d, err %d\n",
+					  i, err);
+				return err;
+			}
+		}
+
+		for (; i < ssdfs_page_vector_capacity(array); i++) {
+			page = ssdfs_block_bmap_alloc_page(GFP_KERNEL);
+			if (IS_ERR_OR_NULL(page)) {
+				err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+				SSDFS_ERR("unable to allocate #%d page\n", i);
+				return err;
+			}
+
+			ssdfs_memzero_page(page, 0, PAGE_SIZE, PAGE_SIZE);
+
+			ssdfs_block_bmap_forget_page(page);
+			err = ssdfs_page_vector_add(snapshot, page);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to add page: "
+					  "index %d, err %d\n",
+					  i, err);
+				return err;
+			}
+		}
+		break;
+
+	case SSDFS_BLOCK_BMAP_STORAGE_BUFFER:
+		page = ssdfs_block_bmap_alloc_page(GFP_KERNEL);
+		if (IS_ERR_OR_NULL(page)) {
+			err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+			SSDFS_ERR("unable to allocate memory page\n");
+			return err;
+		}
+
+		ssdfs_memcpy_to_page(page,
+				     0, PAGE_SIZE,
+				     blk_bmap->storage.buf,
+				     0, blk_bmap->bytes_count,
+				     blk_bmap->bytes_count);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		kaddr = kmap_local_page(page);
+		SSDFS_DBG("BMAP SNAPSHOT\n");
+		print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+				     kaddr, 32);
+		kunmap_local(kaddr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_block_bmap_forget_page(page);
+		err = ssdfs_page_vector_add(snapshot, page);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to add page: "
+				  "err %d\n", err);
+			return err;
+		}
+		break;
+
+	default:
+		SSDFS_ERR("unexpected state %#x\n",
+			  blk_bmap->storage.state);
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_snapshot() - make snapshot of block bitmap's pagevec
+ * @blk_bmap: pointer on block bitmap
+ * @snapshot: pagevec with snapshot of block bitmap state [out]
+ * @last_free_blk: pointer on last free page value [out]
+ * @metadata_blks: pointer on reserved metadata pages count [out]
+ * @invalid_blks: pointer on invalid blocks count [out]
+ * @bytes_count: size of block bitmap in bytes [out]
+ *
+ * This function copy pages of block bitmap's pagevec into
+ * @snapshot pagevec.
+ *
+ * RETURN:
+ * [success] - @snapshot contains copy of block bitmap's state
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ * %-ENOMEM     - unable to allocate memory.
+ */
+int ssdfs_block_bmap_snapshot(struct ssdfs_block_bmap *blk_bmap,
+				struct ssdfs_page_vector *snapshot,
+				u32 *last_free_page,
+				u32 *metadata_blks,
+				u32 *invalid_blks,
+				size_t *bytes_count)
+{
+	u32 used_pages;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !snapshot);
+	BUG_ON(!last_free_page || !metadata_blks || !bytes_count);
+	BUG_ON(ssdfs_page_vector_count(snapshot) != 0);
+
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap's mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, snapshot %p, last_free_page %p, "
+		  "metadata_blks %p, bytes_count %p\n",
+		  blk_bmap, snapshot, last_free_page,
+		  metadata_blks, bytes_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_block_bmap_initialized(blk_bmap)) {
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+		return -EINVAL;
+	}
+
+	err = ssdfs_block_bmap_snapshot_storage(blk_bmap, snapshot);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to snapshot bmap's storage: err %d\n", err);
+		goto cleanup_snapshot_pagevec;
+	}
+
+	err = ssdfs_define_last_free_page(blk_bmap, last_free_page);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to define last free page: err %d\n", err);
+		goto cleanup_snapshot_pagevec;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("bytes_count %zu, items_count %zu, "
+		  "metadata_items %u, used_blks %u, invalid_blks %u, "
+		  "last_free_page %u\n",
+		  blk_bmap->bytes_count, blk_bmap->items_count,
+		  blk_bmap->metadata_items, blk_bmap->used_blks,
+		  blk_bmap->invalid_blks, *last_free_page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (*last_free_page >= blk_bmap->items_count) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid last_free_page: "
+			  "bytes_count %zu, items_count %zu, "
+			  "metadata_items %u, used_blks %u, invalid_blks %u, "
+			  "last_free_page %u\n",
+			  blk_bmap->bytes_count, blk_bmap->items_count,
+			  blk_bmap->metadata_items, blk_bmap->used_blks,
+			  blk_bmap->invalid_blks, *last_free_page);
+		goto cleanup_snapshot_pagevec;
+	}
+
+	*metadata_blks = blk_bmap->metadata_items;
+	*invalid_blks = blk_bmap->invalid_blks;
+	*bytes_count = blk_bmap->bytes_count;
+
+	used_pages = blk_bmap->used_blks + blk_bmap->invalid_blks +
+			blk_bmap->metadata_items;
+
+	if (used_pages > blk_bmap->items_count) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid values: "
+			  "bytes_count %zu, items_count %zu, "
+			  "metadata_items %u, used_blks %u, invalid_blks %u, "
+			  "last_free_page %u\n",
+			  blk_bmap->bytes_count, blk_bmap->items_count,
+			  blk_bmap->metadata_items, blk_bmap->used_blks,
+			  blk_bmap->invalid_blks, *last_free_page);
+		goto cleanup_snapshot_pagevec;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("clear dirty state\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	clear_block_bmap_dirty(blk_bmap);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("last_free_page %u, metadata_blks %u, "
+		  "bytes_count %zu\n",
+		  *last_free_page, *metadata_blks, *bytes_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+
+cleanup_snapshot_pagevec:
+	ssdfs_page_vector_release(snapshot);
+	return err;
+}
+
+void ssdfs_block_bmap_forget_snapshot(struct ssdfs_page_vector *snapshot)
+{
+	if (!snapshot)
+		return;
+
+	ssdfs_page_vector_release(snapshot);
+}
diff --git a/fs/ssdfs/block_bitmap.h b/fs/ssdfs/block_bitmap.h
new file mode 100644
index 000000000000..0b036eab3707
--- /dev/null
+++ b/fs/ssdfs/block_bitmap.h
@@ -0,0 +1,370 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/block_bitmap.h - PEB's block bitmap declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_BLOCK_BITMAP_H
+#define _SSDFS_BLOCK_BITMAP_H
+
+#include "common_bitmap.h"
+
+#define SSDFS_BLK_STATE_BITS	2
+#define SSDFS_BLK_STATE_MASK	0x3
+
+enum {
+	SSDFS_BLK_FREE		= 0x0,
+	SSDFS_BLK_PRE_ALLOCATED	= 0x1,
+	SSDFS_BLK_VALID		= 0x3,
+	SSDFS_BLK_INVALID	= 0x2,
+	SSDFS_BLK_STATE_MAX	= SSDFS_BLK_VALID + 1,
+};
+
+#define SSDFS_FREE_STATES_BYTE		0x00
+#define SSDFS_PRE_ALLOC_STATES_BYTE	0x55
+#define SSDFS_VALID_STATES_BYTE		0xFF
+#define SSDFS_INVALID_STATES_BYTE	0xAA
+
+#define SSDFS_BLK_BMAP_BYTE(blk_state)({ \
+	u8 value; \
+	switch (blk_state) { \
+	case SSDFS_BLK_FREE: \
+		value = SSDFS_FREE_STATES_BYTE; \
+		break; \
+	case SSDFS_BLK_PRE_ALLOCATED: \
+		value = SSDFS_PRE_ALLOC_STATES_BYTE; \
+		break; \
+	case SSDFS_BLK_VALID: \
+		value = SSDFS_VALID_STATES_BYTE; \
+		break; \
+	case SSDFS_BLK_INVALID: \
+		value = SSDFS_INVALID_STATES_BYTE; \
+		break; \
+	default: \
+		BUG(); \
+	}; \
+	value; \
+})
+
+#define BLK_BMAP_BYTES(items_count) \
+	((items_count + SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS) - 1)  / \
+	 SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS))
+
+static inline
+int SSDFS_BLK2PAGE(u32 blk, u8 item_bits, u16 *offset)
+{
+	u32 blks_per_byte = SSDFS_ITEMS_PER_BYTE(item_bits);
+	u32 blks_per_long = SSDFS_ITEMS_PER_LONG(item_bits);
+	u32 blks_per_page = PAGE_SIZE * blks_per_byte;
+	u32 off;
+
+	if (offset) {
+		off = (blk % blks_per_page) / blks_per_long;
+		off *= sizeof(unsigned long);
+		BUG_ON(off >= U16_MAX);
+		*offset = off;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("blk %u, item_bits %u, blks_per_byte %u, "
+		  "blks_per_long %u, blks_per_page %u, "
+		  "page_index %u\n",
+		  blk, item_bits, blks_per_byte,
+		  blks_per_long, blks_per_page,
+		  blk / blks_per_page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return blk / blks_per_page;
+}
+
+/*
+ * struct ssdfs_last_bmap_search - last search in bitmap
+ * @page_index: index of page in pagevec
+ * @offset: offset of cache from page's begining
+ * @cache: cached bmap's part
+ */
+struct ssdfs_last_bmap_search {
+	int page_index;
+	u16 offset;
+	unsigned long cache;
+};
+
+static inline
+u32 SSDFS_FIRST_CACHED_BLOCK(struct ssdfs_last_bmap_search *search)
+{
+	u32 blks_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS);
+	u32 blks_per_page = PAGE_SIZE * blks_per_byte;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page_index %d, offset %u, "
+		  "blks_per_byte %u, blks_per_page %u\n",
+		  search->page_index,
+		  search->offset,
+		  blks_per_byte, blks_per_page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (search->page_index * blks_per_page) +
+		(search->offset * blks_per_byte);
+}
+
+enum {
+	SSDFS_FREE_BLK_SEARCH,
+	SSDFS_VALID_BLK_SEARCH,
+	SSDFS_OTHER_BLK_SEARCH,
+	SSDFS_SEARCH_TYPE_MAX,
+};
+
+static inline
+int SSDFS_GET_CACHE_TYPE(int blk_state)
+{
+	switch (blk_state) {
+	case SSDFS_BLK_FREE:
+		return SSDFS_FREE_BLK_SEARCH;
+
+	case SSDFS_BLK_VALID:
+		return SSDFS_VALID_BLK_SEARCH;
+
+	case SSDFS_BLK_PRE_ALLOCATED:
+	case SSDFS_BLK_INVALID:
+		return SSDFS_OTHER_BLK_SEARCH;
+	};
+
+	return SSDFS_SEARCH_TYPE_MAX;
+}
+
+#define SSDFS_BLK_BMAP_INITIALIZED	(1 << 0)
+#define SSDFS_BLK_BMAP_DIRTY		(1 << 1)
+
+/*
+ * struct ssdfs_block_bmap_storage - block bitmap's storage
+ * @state: storage state
+ * @array: vector of pages
+ * @buf: pointer on memory buffer
+ */
+struct ssdfs_block_bmap_storage {
+	int state;
+	struct ssdfs_page_vector array;
+	void *buf;
+};
+
+/* Block bitmap's storage's states */
+enum {
+	SSDFS_BLOCK_BMAP_STORAGE_ABSENT,
+	SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC,
+	SSDFS_BLOCK_BMAP_STORAGE_BUFFER,
+	SSDFS_BLOCK_BMAP_STORAGE_STATE_MAX
+};
+
+/*
+ * struct ssdfs_block_bmap - in-core segment's block bitmap
+ * @lock: block bitmap lock
+ * @flags: block bitmap state flags
+ * @storage: block bitmap's storage
+ * @bytes_count: block bitmap size in bytes
+ * @items_count: items count in bitmap
+ * @metadata_items: count of metadata items
+ * @used_blks: count of valid blocks
+ * @invalid_blks: count of invalid blocks
+ * @last_search: last search/access cache array
+ */
+struct ssdfs_block_bmap {
+	struct mutex lock;
+	atomic_t flags;
+	struct ssdfs_block_bmap_storage storage;
+	size_t bytes_count;
+	size_t items_count;
+	u32 metadata_items;
+	u32 used_blks;
+	u32 invalid_blks;
+	struct ssdfs_last_bmap_search last_search[SSDFS_SEARCH_TYPE_MAX];
+};
+
+/*
+ * compare_block_bmap_ranges() - compare two ranges
+ * @range1: left range
+ * @range2: right range
+ *
+ * RETURN:
+ *  0: range1 == range2
+ * -1: range1 < range2
+ *  1: range1 > range2
+ */
+static inline
+int compare_block_bmap_ranges(struct ssdfs_block_bmap_range *range1,
+				struct ssdfs_block_bmap_range *range2)
+{
+	u32 range1_end, range2_end;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!range1 || !range2);
+
+	SSDFS_DBG("range1 (start %u, len %u), range2 (start %u, len %u)\n",
+		  range1->start, range1->len, range2->start, range2->len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (range1->start == range2->start) {
+		if (range1->len == range2->len)
+			return 0;
+		else if (range1->len < range2->len)
+			return -1;
+		else
+			return 1;
+	} else if (range1->start < range2->start) {
+		range1_end = range1->start + range1->len;
+		range2_end = range2->start + range2->len;
+
+		if (range2_end <= range1_end)
+			return 1;
+		else
+			return -1;
+	}
+
+	/* range1->start > range2->start */
+	return -1;
+}
+
+/*
+ * ranges_have_intersection() - have ranges intersection?
+ * @range1: left range
+ * @range2: right range
+ *
+ * RETURN:
+ * [true]  - ranges have intersection
+ * [false] - ranges doesn't intersect
+ */
+static inline
+bool ranges_have_intersection(struct ssdfs_block_bmap_range *range1,
+				struct ssdfs_block_bmap_range *range2)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!range1 || !range2);
+
+	SSDFS_DBG("range1 (start %u, len %u), range2 (start %u, len %u)\n",
+		  range1->start, range1->len, range2->start, range2->len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if ((range2->start + range2->len) <= range1->start)
+		return false;
+	else if ((range1->start + range1->len) <= range2->start)
+		return false;
+
+	return true;
+}
+
+enum {
+	SSDFS_BLK_BMAP_CREATE,
+	SSDFS_BLK_BMAP_INIT,
+};
+
+/* Function prototypes */
+int ssdfs_block_bmap_create(struct ssdfs_fs_info *fsi,
+			    struct ssdfs_block_bmap *bmap,
+			    u32 items_count,
+			    int flag, int init_state);
+void ssdfs_block_bmap_destroy(struct ssdfs_block_bmap *blk_bmap);
+int ssdfs_block_bmap_init(struct ssdfs_block_bmap *blk_bmap,
+			  struct ssdfs_page_vector *source,
+			  u32 last_free_blk,
+			  u32 metadata_blks,
+			  u32 invalid_blks);
+int ssdfs_block_bmap_snapshot(struct ssdfs_block_bmap *blk_bmap,
+				struct ssdfs_page_vector *snapshot,
+				u32 *last_free_page,
+				u32 *metadata_blks,
+				u32 *invalid_blks,
+				size_t *bytes_count);
+void ssdfs_block_bmap_forget_snapshot(struct ssdfs_page_vector *snapshot);
+
+int ssdfs_block_bmap_lock(struct ssdfs_block_bmap *blk_bmap);
+bool ssdfs_block_bmap_is_locked(struct ssdfs_block_bmap *blk_bmap);
+void ssdfs_block_bmap_unlock(struct ssdfs_block_bmap *blk_bmap);
+
+bool ssdfs_block_bmap_dirtied(struct ssdfs_block_bmap *blk_bmap);
+void ssdfs_block_bmap_clear_dirty_state(struct ssdfs_block_bmap *blk_bmap);
+bool ssdfs_block_bmap_initialized(struct ssdfs_block_bmap *blk_bmap);
+void ssdfs_set_block_bmap_initialized(struct ssdfs_block_bmap *blk_bmap);
+
+bool ssdfs_block_bmap_test_block(struct ssdfs_block_bmap *blk_bmap,
+				 u32 blk, int blk_state);
+bool ssdfs_block_bmap_test_range(struct ssdfs_block_bmap *blk_bmap,
+				 struct ssdfs_block_bmap_range *range,
+				 int blk_state);
+int ssdfs_get_block_state(struct ssdfs_block_bmap *blk_bmap, u32 blk);
+int ssdfs_get_range_state(struct ssdfs_block_bmap *blk_bmap,
+			  struct ssdfs_block_bmap_range *range);
+int ssdfs_block_bmap_reserve_metadata_pages(struct ssdfs_block_bmap *blk_bmap,
+					    u32 count);
+int ssdfs_block_bmap_free_metadata_pages(struct ssdfs_block_bmap *blk_bmap,
+					 u32 count);
+int ssdfs_block_bmap_get_free_pages(struct ssdfs_block_bmap *blk_bmap);
+int ssdfs_block_bmap_get_used_pages(struct ssdfs_block_bmap *blk_bmap);
+int ssdfs_block_bmap_get_invalid_pages(struct ssdfs_block_bmap *blk_bmap);
+int ssdfs_block_bmap_pre_allocate(struct ssdfs_block_bmap *blk_bmap,
+				  u32 start, u32 *len,
+				  struct ssdfs_block_bmap_range *range);
+int ssdfs_block_bmap_allocate(struct ssdfs_block_bmap *blk_bmap,
+				u32 start, u32 *len,
+				struct ssdfs_block_bmap_range *range);
+int ssdfs_block_bmap_invalidate(struct ssdfs_block_bmap *blk_bmap,
+				struct ssdfs_block_bmap_range *range);
+int ssdfs_block_bmap_collect_garbage(struct ssdfs_block_bmap *blk_bmap,
+				     u32 start, u32 max_len,
+				     int blk_state,
+				     struct ssdfs_block_bmap_range *range);
+int ssdfs_block_bmap_clean(struct ssdfs_block_bmap *blk_bmap);
+
+#define SSDFS_BLK_BMAP_FNS(state, name)					\
+static inline								\
+bool is_block_##name(struct ssdfs_block_bmap *blk_bmap, u32 blk)	\
+{									\
+	return ssdfs_block_bmap_test_block(blk_bmap, blk,		\
+					    SSDFS_BLK_##state);		\
+}									\
+static inline								\
+bool is_range_##name(struct ssdfs_block_bmap *blk_bmap,			\
+			struct ssdfs_block_bmap_range *range)		\
+{									\
+	return ssdfs_block_bmap_test_range(blk_bmap, range,		\
+					    SSDFS_BLK_##state);		\
+}									\
+
+/*
+ * is_block_free()
+ * is_range_free()
+ */
+SSDFS_BLK_BMAP_FNS(FREE, free)
+
+/*
+ * is_block_pre_allocated()
+ * is_range_pre_allocated()
+ */
+SSDFS_BLK_BMAP_FNS(PRE_ALLOCATED, pre_allocated)
+
+/*
+ * is_block_valid()
+ * is_range_valid()
+ */
+SSDFS_BLK_BMAP_FNS(VALID, valid)
+
+/*
+ * is_block_invalid()
+ * is_range_invalid()
+ */
+SSDFS_BLK_BMAP_FNS(INVALID, invalid)
+
+#endif /* _SSDFS_BLOCK_BITMAP_H */
diff --git a/fs/ssdfs/block_bitmap_tables.c b/fs/ssdfs/block_bitmap_tables.c
new file mode 100644
index 000000000000..4f7e04a8a9b6
--- /dev/null
+++ b/fs/ssdfs/block_bitmap_tables.c
@@ -0,0 +1,310 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/block_bitmap_tables.c - declaration of block bitmap's search tables.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+
+/*
+ * Table for determination presence of free block
+ * state in provided byte. Checking byte is used
+ * as index in array.
+ */
+const bool detect_free_blk[U8_MAX + 1] = {
+/* 00 - 0x00 */	true, true, true, true,
+/* 01 - 0x04 */	true, true, true, true,
+/* 02 - 0x08 */	true, true, true, true,
+/* 03 - 0x0C */	true, true, true, true,
+/* 04 - 0x10 */	true, true, true, true,
+/* 05 - 0x14 */	true, true, true, true,
+/* 06 - 0x18 */	true, true, true, true,
+/* 07 - 0x1C */	true, true, true, true,
+/* 08 - 0x20 */	true, true, true, true,
+/* 09 - 0x24 */	true, true, true, true,
+/* 10 - 0x28 */	true, true, true, true,
+/* 11 - 0x2C */	true, true, true, true,
+/* 12 - 0x30 */	true, true, true, true,
+/* 13 - 0x34 */	true, true, true, true,
+/* 14 - 0x38 */	true, true, true, true,
+/* 15 - 0x3C */	true, true, true, true,
+/* 16 - 0x40 */	true, true, true, true,
+/* 17 - 0x44 */	true, true, true, true,
+/* 18 - 0x48 */	true, true, true, true,
+/* 19 - 0x4C */	true, true, true, true,
+/* 20 - 0x50 */	true, true, true, true,
+/* 21 - 0x54 */	true, false, false, false,
+/* 22 - 0x58 */	true, false, false, false,
+/* 23 - 0x5C */	true, false, false, false,
+/* 24 - 0x60 */	true, true, true, true,
+/* 25 - 0x64 */	true, false, false, false,
+/* 26 - 0x68 */	true, false, false, false,
+/* 27 - 0x6C */	true, false, false, false,
+/* 28 - 0x70 */	true, true, true, true,
+/* 29 - 0x74 */	true, false, false, false,
+/* 30 - 0x78 */	true, false, false, false,
+/* 31 - 0x7C */	true, false, false, false,
+/* 32 - 0x80 */	true, true, true, true,
+/* 33 - 0x84 */	true, true, true, true,
+/* 34 - 0x88 */	true, true, true, true,
+/* 35 - 0x8C */	true, true, true, true,
+/* 36 - 0x90 */	true, true, true, true,
+/* 37 - 0x94 */	true, false, false, false,
+/* 38 - 0x98 */	true, false, false, false,
+/* 39 - 0x9C */	true, false, false, false,
+/* 40 - 0xA0 */	true, true, true, true,
+/* 41 - 0xA4 */	true, false, false, false,
+/* 42 - 0xA8 */	true, false, false, false,
+/* 43 - 0xAC */	true, false, false, false,
+/* 44 - 0xB0 */	true, true, true, true,
+/* 45 - 0xB4 */	true, false, false, false,
+/* 46 - 0xB8 */	true, false, false, false,
+/* 47 - 0xBC */	true, false, false, false,
+/* 48 - 0xC0 */	true, true, true, true,
+/* 49 - 0xC4 */	true, true, true, true,
+/* 50 - 0xC8 */	true, true, true, true,
+/* 51 - 0xCC */	true, true, true, true,
+/* 52 - 0xD0 */	true, true, true, true,
+/* 53 - 0xD4 */	true, false, false, false,
+/* 54 - 0xD8 */	true, false, false, false,
+/* 55 - 0xDC */	true, false, false, false,
+/* 56 - 0xE0 */	true, true, true, true,
+/* 57 - 0xE4 */	true, false, false, false,
+/* 58 - 0xE8 */	true, false, false, false,
+/* 59 - 0xEC */	true, false, false, false,
+/* 60 - 0xF0 */	true, true, true, true,
+/* 61 - 0xF4 */	true, false, false, false,
+/* 62 - 0xF8 */	true, false, false, false,
+/* 63 - 0xFC */	true, false, false, false
+};
+
+/*
+ * Table for determination presence of pre-allocated
+ * block state in provided byte. Checking byte is used
+ * as index in array.
+ */
+const bool detect_pre_allocated_blk[U8_MAX + 1] = {
+/* 00 - 0x00 */	false, true, false, false,
+/* 01 - 0x04 */	true, true, true, true,
+/* 02 - 0x08 */	false, true, false, false,
+/* 03 - 0x0C */	false, true, false, false,
+/* 04 - 0x10 */	true, true, true, true,
+/* 05 - 0x14 */	true, true, true, true,
+/* 06 - 0x18 */	true, true, true, true,
+/* 07 - 0x1C */	true, true, true, true,
+/* 08 - 0x20 */	false, true, false, false,
+/* 09 - 0x24 */	true, true, true, true,
+/* 10 - 0x28 */	false, true, false, false,
+/* 11 - 0x2C */	false, true, false, false,
+/* 12 - 0x30 */	false, true, false, false,
+/* 13 - 0x34 */	true, true, true, true,
+/* 14 - 0x38 */	false, true, false, false,
+/* 15 - 0x3C */	false, true, false, false,
+/* 16 - 0x40 */	true, true, true, true,
+/* 17 - 0x44 */	true, true, true, true,
+/* 18 - 0x48 */	true, true, true, true,
+/* 19 - 0x4C */	true, true, true, true,
+/* 20 - 0x50 */	true, true, true, true,
+/* 21 - 0x54 */	true, true, true, true,
+/* 22 - 0x58 */	true, true, true, true,
+/* 23 - 0x5C */	true, true, true, true,
+/* 24 - 0x60 */	true, true, true, true,
+/* 25 - 0x64 */	true, true, true, true,
+/* 26 - 0x68 */	true, true, true, true,
+/* 27 - 0x6C */	true, true, true, true,
+/* 28 - 0x70 */	true, true, true, true,
+/* 29 - 0x74 */	true, true, true, true,
+/* 30 - 0x78 */	true, true, true, true,
+/* 31 - 0x7C */	true, true, true, true,
+/* 32 - 0x80 */	false, true, false, false,
+/* 33 - 0x84 */	true, true, true, true,
+/* 34 - 0x88 */	false, true, false, false,
+/* 35 - 0x8C */	false, true, false, false,
+/* 36 - 0x90 */	true, true, true, true,
+/* 37 - 0x94 */	true, true, true, true,
+/* 38 - 0x98 */	true, true, true, true,
+/* 39 - 0x9C */	true, true, true, true,
+/* 40 - 0xA0 */	false, true, false, false,
+/* 41 - 0xA4 */	true, true, true, true,
+/* 42 - 0xA8 */	false, true, false, false,
+/* 43 - 0xAC */	false, true, false, false,
+/* 44 - 0xB0 */	false, true, false, false,
+/* 45 - 0xB4 */	true, true, true, true,
+/* 46 - 0xB8 */	false, true, false, false,
+/* 47 - 0xBC */	false, true, false, false,
+/* 48 - 0xC0 */	false, true, false, false,
+/* 49 - 0xC4 */	true, true, true, true,
+/* 50 - 0xC8 */	false, true, false, false,
+/* 51 - 0xCC */	false, true, false, false,
+/* 52 - 0xD0 */	true, true, true, true,
+/* 53 - 0xD4 */	true, true, true, true,
+/* 54 - 0xD8 */	true, true, true, true,
+/* 55 - 0xDC */	true, true, true, true,
+/* 56 - 0xE0 */	false, true, false, false,
+/* 57 - 0xE4 */	true, true, true, true,
+/* 58 - 0xE8 */	false, true, false, false,
+/* 59 - 0xEC */	false, true, false, false,
+/* 60 - 0xF0 */	false, true, false, false,
+/* 61 - 0xF4 */	true, true, true, true,
+/* 62 - 0xF8 */	false, true, false, false,
+/* 63 - 0xFC */	false, true, false, false
+};
+
+/*
+ * Table for determination presence of valid block
+ * state in provided byte. Checking byte is used
+ * as index in array.
+ */
+const bool detect_valid_blk[U8_MAX + 1] = {
+/* 00 - 0x00 */	false, false, false, true,
+/* 01 - 0x04 */	false, false, false, true,
+/* 02 - 0x08 */	false, false, false, true,
+/* 03 - 0x0C */	true, true, true, true,
+/* 04 - 0x10 */	false, false, false, true,
+/* 05 - 0x14 */	false, false, false, true,
+/* 06 - 0x18 */	false, false, false, true,
+/* 07 - 0x1C */	true, true, true, true,
+/* 08 - 0x20 */	false, false, false, true,
+/* 09 - 0x24 */	false, false, false, true,
+/* 10 - 0x28 */	false, false, false, true,
+/* 11 - 0x2C */	true, true, true, true,
+/* 12 - 0x30 */	true, true, true, true,
+/* 13 - 0x34 */	true, true, true, true,
+/* 14 - 0x38 */	true, true, true, true,
+/* 15 - 0x3C */	true, true, true, true,
+/* 16 - 0x40 */	false, false, false, true,
+/* 17 - 0x44 */	false, false, false, true,
+/* 18 - 0x48 */	false, false, false, true,
+/* 19 - 0x4C */	true, true, true, true,
+/* 20 - 0x50 */	false, false, false, true,
+/* 21 - 0x54 */	false, false, false, true,
+/* 22 - 0x58 */	false, false, false, true,
+/* 23 - 0x5C */	true, true, true, true,
+/* 24 - 0x60 */	false, false, false, true,
+/* 25 - 0x64 */	false, false, false, true,
+/* 26 - 0x68 */	false, false, false, true,
+/* 27 - 0x6C */	true, true, true, true,
+/* 28 - 0x70 */	true, true, true, true,
+/* 29 - 0x74 */	true, true, true, true,
+/* 30 - 0x78 */	true, true, true, true,
+/* 31 - 0x7C */	true, true, true, true,
+/* 32 - 0x80 */	false, false, false, true,
+/* 33 - 0x84 */	false, false, false, true,
+/* 34 - 0x88 */	false, false, false, true,
+/* 35 - 0x8C */	true, true, true, true,
+/* 36 - 0x90 */	false, false, false, true,
+/* 37 - 0x94 */	false, false, false, true,
+/* 38 - 0x98 */	false, false, false, true,
+/* 39 - 0x9C */	true, true, true, true,
+/* 40 - 0xA0 */	false, false, false, true,
+/* 41 - 0xA4 */	false, false, false, true,
+/* 42 - 0xA8 */	false, false, false, true,
+/* 43 - 0xAC */	true, true, true, true,
+/* 44 - 0xB0 */	true, true, true, true,
+/* 45 - 0xB4 */	true, true, true, true,
+/* 46 - 0xB8 */	true, true, true, true,
+/* 47 - 0xBC */	true, true, true, true,
+/* 48 - 0xC0 */	true, true, true, true,
+/* 49 - 0xC4 */	true, true, true, true,
+/* 50 - 0xC8 */	true, true, true, true,
+/* 51 - 0xCC */	true, true, true, true,
+/* 52 - 0xD0 */	true, true, true, true,
+/* 53 - 0xD4 */	true, true, true, true,
+/* 54 - 0xD8 */	true, true, true, true,
+/* 55 - 0xDC */	true, true, true, true,
+/* 56 - 0xE0 */	true, true, true, true,
+/* 57 - 0xE4 */	true, true, true, true,
+/* 58 - 0xE8 */	true, true, true, true,
+/* 59 - 0xEC */	true, true, true, true,
+/* 60 - 0xF0 */	true, true, true, true,
+/* 61 - 0xF4 */	true, true, true, true,
+/* 62 - 0xF8 */	true, true, true, true,
+/* 63 - 0xFC */	true, true, true, true
+};
+
+/*
+ * Table for determination presence of invalid block
+ * state in provided byte. Checking byte is used
+ * as index in array.
+ */
+const bool detect_invalid_blk[U8_MAX + 1] = {
+/* 00 - 0x00 */	false, false, true, false,
+/* 01 - 0x04 */	false, false, true, false,
+/* 02 - 0x08 */	true, true, true, true,
+/* 03 - 0x0C */	false, false, true, false,
+/* 04 - 0x10 */	false, false, true, false,
+/* 05 - 0x14 */	false, false, true, false,
+/* 06 - 0x18 */	true, true, true, true,
+/* 07 - 0x1C */	false, false, true, false,
+/* 08 - 0x20 */	true, true, true, true,
+/* 09 - 0x24 */	true, true, true, true,
+/* 10 - 0x28 */	true, true, true, true,
+/* 11 - 0x2C */	true, true, true, true,
+/* 12 - 0x30 */	false, false, true, false,
+/* 13 - 0x34 */	false, false, true, false,
+/* 14 - 0x38 */	true, true, true, true,
+/* 15 - 0x3C */	false, false, true, false,
+/* 16 - 0x40 */	false, false, true, false,
+/* 17 - 0x44 */	false, false, true, false,
+/* 18 - 0x48 */	true, true, true, true,
+/* 19 - 0x4C */	false, false, true, false,
+/* 20 - 0x50 */	false, false, true, false,
+/* 21 - 0x54 */	false, false, true, false,
+/* 22 - 0x58 */	true, true, true, true,
+/* 23 - 0x5C */	false, false, true, false,
+/* 24 - 0x60 */	true, true, true, true,
+/* 25 - 0x64 */	true, true, true, true,
+/* 26 - 0x68 */	true, true, true, true,
+/* 27 - 0x6C */	true, true, true, true,
+/* 28 - 0x70 */	false, false, true, false,
+/* 29 - 0x74 */	false, false, true, false,
+/* 30 - 0x78 */	true, true, true, true,
+/* 31 - 0x7C */	false, false, true, false,
+/* 32 - 0x80 */	true, true, true, true,
+/* 33 - 0x84 */	true, true, true, true,
+/* 34 - 0x88 */	true, true, true, true,
+/* 35 - 0x8C */	true, true, true, true,
+/* 36 - 0x90 */	true, true, true, true,
+/* 37 - 0x94 */	true, true, true, true,
+/* 38 - 0x98 */	true, true, true, true,
+/* 39 - 0x9C */	true, true, true, true,
+/* 40 - 0xA0 */	true, true, true, true,
+/* 41 - 0xA4 */	true, true, true, true,
+/* 42 - 0xA8 */	true, true, true, true,
+/* 43 - 0xAC */	true, true, true, true,
+/* 44 - 0xB0 */	true, true, true, true,
+/* 45 - 0xB4 */	true, true, true, true,
+/* 46 - 0xB8 */	true, true, true, true,
+/* 47 - 0xBC */	true, true, true, true,
+/* 48 - 0xC0 */	false, false, true, false,
+/* 49 - 0xC4 */	false, false, true, false,
+/* 50 - 0xC8 */	true, true, true, true,
+/* 51 - 0xCC */	false, false, true, false,
+/* 52 - 0xD0 */	false, false, true, false,
+/* 53 - 0xD4 */	false, false, true, false,
+/* 54 - 0xD8 */	true, true, true, true,
+/* 55 - 0xDC */	false, false, true, false,
+/* 56 - 0xE0 */	true, true, true, true,
+/* 57 - 0xE4 */	true, true, true, true,
+/* 58 - 0xE8 */	true, true, true, true,
+/* 59 - 0xEC */	true, true, true, true,
+/* 60 - 0xF0 */	false, false, true, false,
+/* 61 - 0xF4 */	false, false, true, false,
+/* 62 - 0xF8 */	true, true, true, true,
+/* 63 - 0xFC */	false, false, true, false
+};
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH 11/76] ssdfs: block bitmap search operations implementation
  2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
                   ` (9 preceding siblings ...)
  2023-02-25  1:08 ` [RFC PATCH 10/76] ssdfs: introduce PEB's block bitmap Viacheslav Dubeyko
@ 2023-02-25  1:08 ` Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 12/76] ssdfs: block bitmap modification " Viacheslav Dubeyko
                   ` (65 subsequent siblings)
  76 siblings, 0 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

Implement internal block bitmap's search operations for
pre_allocate, allocate, and collect_garbage operations.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
CC: Viacheslav Dubeyko <viacheslav.dubeyko@bytedance.com>
CC: Luka Perkov <luka.perkov@sartura.hr>
CC: Bruno Banelli <bruno.banelli@sartura.hr>
---
 fs/ssdfs/block_bitmap.c | 3401 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 3401 insertions(+)

diff --git a/fs/ssdfs/block_bitmap.c b/fs/ssdfs/block_bitmap.c
index fd7e84258cf0..3e3ddb6ff745 100644
--- a/fs/ssdfs/block_bitmap.c
+++ b/fs/ssdfs/block_bitmap.c
@@ -1207,3 +1207,3404 @@ void ssdfs_block_bmap_forget_snapshot(struct ssdfs_page_vector *snapshot)
 
 	ssdfs_page_vector_release(snapshot);
 }
+
+/*
+ * ssdfs_block_bmap_lock() - lock segment's block bitmap
+ * @blk_bmap: pointer on block bitmap
+ */
+int ssdfs_block_bmap_lock(struct ssdfs_block_bmap *blk_bmap)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("blk_bmap %p\n", blk_bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = mutex_lock_killable(&blk_bmap->lock);
+	if (err) {
+		SSDFS_ERR("fail to lock block bitmap: err %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_is_locked() - check that block bitmap is locked
+ * @blk_bmap: pointer on block bitmap
+ */
+bool ssdfs_block_bmap_is_locked(struct ssdfs_block_bmap *blk_bmap)
+{
+	return mutex_is_locked(&blk_bmap->lock);
+}
+
+/*
+ * ssdfs_block_bmap_unlock() - unlock segment's block bitmap
+ * @blk_bmap: pointer on block bitmap
+ */
+void ssdfs_block_bmap_unlock(struct ssdfs_block_bmap *blk_bmap)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("blk_bmap %p\n", blk_bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	mutex_unlock(&blk_bmap->lock);
+}
+
+/*
+ * ssdfs_get_cache_type() - define cache type for block
+ * @blk_bmap: pointer on block bitmap
+ * @blk: block number
+ *
+ * RETURN:
+ * [success] - cache type
+ * [failure] - SSDFS_SEARCH_TYPE_MAX
+ */
+static
+int ssdfs_get_cache_type(struct ssdfs_block_bmap *blk_bmap,
+			 u32 blk)
+{
+	int page_index;
+	u16 offset;
+	int i;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	SSDFS_DBG("blk_bmap %p, block %u\n", blk_bmap, blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	page_index = SSDFS_BLK2PAGE(blk, SSDFS_BLK_STATE_BITS, &offset);
+
+	for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) {
+		struct ssdfs_last_bmap_search *last;
+
+		last = &blk_bmap->last_search[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("last->page_index %d, page_index %d, "
+			  "last->offset %u, offset %u, "
+			  "search_type %#x\n",
+			  last->page_index, page_index,
+			  last->offset, offset, i);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (last->page_index == page_index &&
+		    last->offset == offset)
+			return i;
+	}
+
+	return SSDFS_SEARCH_TYPE_MAX;
+}
+
+/*
+ * is_block_state_cached() - check that block state is in cache
+ * @blk_bmap: pointer on block bitmap
+ * @blk: block number
+ *
+ * RETURN:
+ * [true]  - block state is in cache
+ * [false] - cache doesn't contain block state
+ */
+static
+bool is_block_state_cached(struct ssdfs_block_bmap *blk_bmap,
+			   u32 blk)
+{
+	int cache_type;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	SSDFS_DBG("blk_bmap %p, block %u\n", blk_bmap, blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	cache_type = ssdfs_get_cache_type(blk_bmap, blk);
+
+	if (cache_type < 0) {
+		SSDFS_ERR("invalid cache type %d\n", cache_type);
+		return false;
+	}
+
+	if (cache_type >= SSDFS_SEARCH_TYPE_MAX)
+		return false;
+
+	return true;
+}
+
+/*
+ * ssdfs_determine_cache_type() - detect type of cache for value
+ * @cache: value for caching
+ *
+ * RETURN: suggested type of cache
+ */
+static
+int ssdfs_determine_cache_type(unsigned long cache)
+{
+	size_t bytes_per_long = sizeof(cache);
+	size_t criterion = bytes_per_long / 2;
+	u8 bytes[SSDFS_BLK_STATE_MAX] = {0};
+	int i;
+
+	for (i = 0; i < bytes_per_long; i++) {
+		int cur_state = (int)((cache >> (i * BITS_PER_BYTE)) & 0xFF);
+
+		switch (cur_state) {
+		case SSDFS_FREE_STATES_BYTE:
+			bytes[SSDFS_BLK_FREE]++;
+			break;
+
+		case SSDFS_PRE_ALLOC_STATES_BYTE:
+			bytes[SSDFS_BLK_PRE_ALLOCATED]++;
+			break;
+
+		case SSDFS_VALID_STATES_BYTE:
+			bytes[SSDFS_BLK_VALID]++;
+			break;
+
+		case SSDFS_INVALID_STATES_BYTE:
+			bytes[SSDFS_BLK_INVALID]++;
+			break;
+
+		default:
+			/* mix of block states */
+			break;
+		};
+	}
+
+	if (bytes[SSDFS_BLK_FREE] > criterion)
+		return SSDFS_FREE_BLK_SEARCH;
+	else if (bytes[SSDFS_BLK_VALID] > criterion)
+		return SSDFS_VALID_BLK_SEARCH;
+
+	return SSDFS_OTHER_BLK_SEARCH;
+}
+
+/*
+ * ssdfs_cache_block_state() - cache block state from pagevec
+ * @blk_bmap: pointer on block bitmap
+ * @blk: segment's block
+ * @blk_state: state as hint for cache type determination
+ *
+ * This function retrieves state of @blk from pagevec
+ * and  save retrieved value for requested type of cache.
+ * If @blk_state has SSDFS_BLK_STATE_MAX value then function
+ * defines block state and to cache value in proper place.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-EOPNOTSUPP - invalid page index.
+ */
+static
+int ssdfs_cache_block_state(struct ssdfs_block_bmap *blk_bmap,
+			    u32 blk, int blk_state)
+{
+	struct ssdfs_page_vector *array;
+	int page_index;
+	u16 offset;
+	void *kaddr;
+	unsigned long cache;
+	int cache_type;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	SSDFS_DBG("blk_bmap %p, block %u, state %#x\n",
+		  blk_bmap, blk, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (blk_state > SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	if (is_block_state_cached(blk_bmap, blk)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("block %u has been cached already\n", blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return 0;
+	}
+
+	page_index = SSDFS_BLK2PAGE(blk, SSDFS_BLK_STATE_BITS, &offset);
+
+	switch (blk_bmap->storage.state) {
+	case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC:
+		array = &blk_bmap->storage.array;
+
+		if (page_index >= ssdfs_page_vector_capacity(array)) {
+			SSDFS_ERR("invalid page index %d\n", page_index);
+			return -EOPNOTSUPP;
+		}
+
+		if (page_index >= ssdfs_page_vector_count(array)) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("absent page index %d\n", page_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return -ENOENT;
+		}
+
+		err = ssdfs_memcpy_from_page(&cache,
+					     0, sizeof(unsigned long),
+					     array->pages[page_index],
+					     offset, PAGE_SIZE,
+					     sizeof(unsigned long));
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy: err %d\n", err);
+			return err;
+		}
+		break;
+
+	case SSDFS_BLOCK_BMAP_STORAGE_BUFFER:
+		if (page_index > 0) {
+			SSDFS_ERR("invalid page_index %d\n", page_index);
+			return -ERANGE;
+		}
+
+		kaddr = blk_bmap->storage.buf;
+		err = ssdfs_memcpy(&cache, 0, sizeof(unsigned long),
+				   kaddr, offset, blk_bmap->bytes_count,
+				   sizeof(unsigned long));
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy: err %d\n", err);
+			return err;
+		}
+		break;
+
+	default:
+		SSDFS_ERR("unexpected state %#x\n", blk_bmap->storage.state);
+		return -ERANGE;
+	}
+
+	cache_type = ssdfs_determine_cache_type(cache);
+	BUG_ON(cache_type >= SSDFS_SEARCH_TYPE_MAX);
+
+	blk_bmap->last_search[cache_type].page_index = page_index;
+	blk_bmap->last_search[cache_type].offset = offset;
+	blk_bmap->last_search[cache_type].cache = cache;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("last_search.cache %lx, cache_type %#x, "
+		  "page_index %d, offset %u\n",
+		  cache, cache_type,
+		  page_index, offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_define_bits_shift_in_cache() - calculate bit shift of block in cache
+ * @blk_bmap: pointer on block bitmap
+ * @cache_type: type of cache
+ * @blk: segment's block
+ *
+ * This function calculates bit shift of @blk in cache of
+ * @cache_type.
+ *
+ * RETURN:
+ * [success] - bit shift
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ */
+static
+int ssdfs_define_bits_shift_in_cache(struct ssdfs_block_bmap *blk_bmap,
+				     int cache_type, u32 blk)
+{
+	struct ssdfs_last_bmap_search *last_search;
+	u32 first_cached_block, diff;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	if (blk >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid block %u\n", blk);
+		return -EINVAL;
+	}
+
+	if (cache_type < 0) {
+		SSDFS_ERR("invalid cache type %d\n", cache_type);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, cache_type %#x, blk %u\n",
+		  blk_bmap, cache_type, blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (cache_type >= SSDFS_SEARCH_TYPE_MAX) {
+		SSDFS_ERR("cache doesn't contain block %u\n", blk);
+		return -EINVAL;
+	}
+
+	last_search = &blk_bmap->last_search[cache_type];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("last_search.cache %lx\n", last_search->cache);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	first_cached_block = SSDFS_FIRST_CACHED_BLOCK(last_search);
+
+	if (first_cached_block > blk) {
+		SSDFS_ERR("first_cached_block %u > blk %u\n",
+			  first_cached_block, blk);
+		return -EINVAL;
+	}
+
+	diff = blk - first_cached_block;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	if (diff >= (U32_MAX / SSDFS_BLK_STATE_BITS)) {
+		SSDFS_ERR("invalid diff %u; blk %u, first_cached_block %u\n",
+			  diff, blk, first_cached_block);
+		return -EINVAL;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	diff *= SSDFS_BLK_STATE_BITS;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	if (diff > (BITS_PER_LONG - SSDFS_BLK_STATE_BITS)) {
+		SSDFS_ERR("invalid diff %u; bits_per_long %u, "
+			  "bits_per_state %u\n",
+			  diff, BITS_PER_LONG, SSDFS_BLK_STATE_BITS);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("diff %u\n", diff);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (int)diff;
+}
+
+/*
+ * ssdfs_get_block_state_from_cache() - retrieve block state from cache
+ * @blk_bmap: pointer on block bitmap
+ * @blk: segment's block
+ *
+ * This function retrieve state of @blk from cache.
+ *
+ * RETURN:
+ * [success] - state of block
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ */
+static
+int ssdfs_get_block_state_from_cache(struct ssdfs_block_bmap *blk_bmap,
+				     u32 blk)
+{
+	int cache_type;
+	struct ssdfs_last_bmap_search *last_search;
+	int shift;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	if (blk >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid block %u\n", blk);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, block %u\n", blk_bmap, blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	cache_type = ssdfs_get_cache_type(blk_bmap, blk);
+	shift = ssdfs_define_bits_shift_in_cache(blk_bmap, cache_type, blk);
+	if (unlikely(shift < 0)) {
+		SSDFS_ERR("fail to define bits shift: "
+			  "cache_type %d, blk %u, err %d\n",
+			  cache_type, blk, shift);
+		return shift;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(cache_type >= SSDFS_SEARCH_TYPE_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	last_search = &blk_bmap->last_search[cache_type];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("last_search.cache %lx\n", last_search->cache);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (int)((last_search->cache >> shift) & SSDFS_BLK_STATE_MASK);
+}
+
+/*
+ * ssdfs_set_block_state_in_cache() - set block state in cache
+ * @blk_bmap: pointer on block bitmap
+ * @blk: segment's block
+ * @blk_state: new state of @blk
+ *
+ * This function sets state @blk_state of @blk in cache.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ */
+static
+int ssdfs_set_block_state_in_cache(struct ssdfs_block_bmap *blk_bmap,
+				   u32 blk, int blk_state)
+{
+	int cache_type;
+	int shift;
+	unsigned long value, *cached_value;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	if (blk >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid block %u\n", blk);
+		return -EINVAL;
+	}
+
+	if (blk_state > SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, block %u, blk_state %#x\n",
+		  blk_bmap, blk, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	cache_type = ssdfs_get_cache_type(blk_bmap, blk);
+	shift = ssdfs_define_bits_shift_in_cache(blk_bmap, cache_type, blk);
+	if (unlikely(shift < 0)) {
+		SSDFS_ERR("fail to define bits shift: "
+			  "cache_type %d, blk %u, err %d\n",
+			  cache_type, blk, shift);
+		return shift;
+	}
+
+	value = blk_state & SSDFS_BLK_STATE_MASK;
+	value <<= shift;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(cache_type >= SSDFS_SEARCH_TYPE_MAX);
+
+	SSDFS_DBG("value %lx, cache %lx\n",
+		  value,
+		  blk_bmap->last_search[cache_type].cache);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	cached_value = &blk_bmap->last_search[cache_type].cache;
+	*cached_value &= ~((unsigned long)SSDFS_BLK_STATE_MASK << shift);
+	*cached_value |= value;
+
+	return 0;
+}
+
+/*
+ * ssdfs_save_cache_in_storage() - save cached values in storage
+ * @blk_bmap: pointer on block bitmap
+ *
+ * This function saves cached values in storage.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ */
+static
+int ssdfs_save_cache_in_storage(struct ssdfs_block_bmap *blk_bmap)
+{
+	struct ssdfs_page_vector *array;
+	void *kaddr;
+	int max_capacity = SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	SSDFS_DBG("blk_bmap %p\n", blk_bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) {
+		int page_index = blk_bmap->last_search[i].page_index;
+		u16 offset = blk_bmap->last_search[i].offset;
+		unsigned long cache = blk_bmap->last_search[i].cache;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("search_type %d, page_index %d, offset %u\n",
+			  i, page_index, offset);
+		SSDFS_DBG("last_search.cache %lx\n", cache);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (page_index == max_capacity || offset == U16_MAX)
+			continue;
+
+		switch (blk_bmap->storage.state) {
+		case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC:
+			array = &blk_bmap->storage.array;
+
+			if (page_index >= ssdfs_page_vector_capacity(array)) {
+				SSDFS_ERR("block bmap's cache is corrupted: "
+					  "page_index %d, offset %u\n",
+					  page_index, (u32)offset);
+				return -EINVAL;
+			}
+
+			while (page_index >= ssdfs_page_vector_count(array)) {
+				struct page *page;
+
+				page = ssdfs_page_vector_allocate(array);
+				if (IS_ERR_OR_NULL(page)) {
+					err = (page == NULL ? -ENOMEM :
+								PTR_ERR(page));
+					SSDFS_ERR("unable to allocate page\n");
+					return err;
+				}
+
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("page %p, count %d\n",
+					  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+
+			err = ssdfs_memcpy_to_page(array->pages[page_index],
+						   offset, PAGE_SIZE,
+						   &cache,
+						   0, sizeof(unsigned long),
+						   sizeof(unsigned long));
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to copy: err %d\n", err);
+				return err;
+			}
+			break;
+
+		case SSDFS_BLOCK_BMAP_STORAGE_BUFFER:
+			if (page_index > 0) {
+				SSDFS_ERR("invalid page_index %d\n", page_index);
+				return -ERANGE;
+			}
+
+			kaddr = blk_bmap->storage.buf;
+			err = ssdfs_memcpy(kaddr, offset, blk_bmap->bytes_count,
+					   &cache, 0, sizeof(unsigned long),
+					   sizeof(unsigned long));
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to copy: err %d\n", err);
+				return err;
+			}
+			break;
+
+		default:
+			SSDFS_ERR("unexpected state %#x\n",
+					blk_bmap->storage.state);
+			return -ERANGE;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * is_cache_invalid() - check that cache is invalid for requested state
+ * @blk_bmap: pointer on block bitmap
+ * @blk_state: requested block's state
+ *
+ * RETURN:
+ * [true]  - cache doesn't been initialized yet.
+ * [false] - cache is valid.
+ */
+static inline
+bool is_cache_invalid(struct ssdfs_block_bmap *blk_bmap, int blk_state)
+{
+	struct ssdfs_last_bmap_search *last_search;
+	int cache_type = SSDFS_GET_CACHE_TYPE(blk_state);
+	int max_capacity = SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX;
+
+	if (cache_type >= SSDFS_SEARCH_TYPE_MAX) {
+		SSDFS_ERR("invalid cache type %#x, blk_state %#x\n",
+			  cache_type, blk_state);
+		return true;
+	}
+
+	last_search = &blk_bmap->last_search[cache_type];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("last_search.cache %lx\n", last_search->cache);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (last_search->page_index >= max_capacity ||
+	    last_search->offset == U16_MAX)
+		return true;
+
+	return false;
+}
+
+/*
+ * BYTE_CONTAINS_STATE() - check that provided byte contains state
+ * @value: pointer on analysed byte
+ * @blk_state: requested block's state
+ *
+ * RETURN:
+ * [true]  - @value contains @blk_state.
+ * [false] - @value hasn't @blk_state.
+ */
+static inline
+bool BYTE_CONTAINS_STATE(u8 *value, int blk_state)
+{
+	switch (blk_state) {
+	case SSDFS_BLK_FREE:
+		return detect_free_blk[*value];
+
+	case SSDFS_BLK_PRE_ALLOCATED:
+		return detect_pre_allocated_blk[*value];
+
+	case SSDFS_BLK_VALID:
+		return detect_valid_blk[*value];
+
+	case SSDFS_BLK_INVALID:
+		return detect_invalid_blk[*value];
+	};
+
+	return false;
+}
+
+/*
+ * ssdfs_block_bmap_find_block_in_cache() - find block for state in cache
+ * @blk_bmap: pointer on block bitmap
+ * @start: starting block for search
+ * @max_blk: upper bound for search
+ * @blk_state: requested block's state
+ * @found_blk: pointer on found block for requested state [out]
+ *
+ * This function tries to find in block block bitmap with @blk_state
+ * in range [@start, @max_blk).
+ *
+ * RETURN:
+ * [success] - @found_blk contains found block number for @blk_state.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENODATA    - requested range [@start, @max_blk) doesn't contain
+ *                any block with @blk_state.
+ */
+static
+int ssdfs_block_bmap_find_block_in_cache(struct ssdfs_block_bmap *blk_bmap,
+					 u32 start, u32 max_blk,
+					 int blk_state, u32 *found_blk)
+{
+	int cache_type = SSDFS_GET_CACHE_TYPE(blk_state);
+	u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS);
+	struct ssdfs_last_bmap_search *last_search;
+	u32 first_cached_blk;
+	u32 byte_index;
+	u8 blks_diff;
+	size_t bytes_per_long = sizeof(unsigned long);
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !found_blk);
+
+	if (blk_state > SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	if (start >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid start block %u\n", start);
+		return -EINVAL;
+	}
+
+	if (start > max_blk) {
+		SSDFS_ERR("start %u > max_blk %u\n", start, max_blk);
+		return -EINVAL;
+	}
+
+	if (!is_block_state_cached(blk_bmap, start)) {
+		SSDFS_ERR("cache doesn't contain start %u\n", start);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, start %u, max_blk %u, "
+		  "state %#x, found_blk %p\n",
+		  blk_bmap, start, max_blk, blk_state, found_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (cache_type >= SSDFS_SEARCH_TYPE_MAX) {
+		SSDFS_ERR("invalid cache type %#x, blk_state %#x\n",
+			  cache_type, blk_state);
+		return -EINVAL;
+	}
+
+	*found_blk = max_blk;
+	last_search = &blk_bmap->last_search[cache_type];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("last_search.cache %lx\n", last_search->cache);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	first_cached_blk = SSDFS_FIRST_CACHED_BLOCK(last_search);
+	blks_diff = start - first_cached_blk;
+	byte_index = blks_diff / items_per_byte;
+	blks_diff = blks_diff % items_per_byte;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("first_cached_blk %u, start %u, "
+		  "byte_index %u, bytes_per_long %zu\n",
+		  first_cached_blk, start,
+		  byte_index, bytes_per_long);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (; byte_index < bytes_per_long; byte_index++) {
+		u8 *value = (u8 *)&last_search->cache + byte_index;
+		u8 found_off;
+
+		err = FIND_FIRST_ITEM_IN_BYTE(value, blk_state,
+					      SSDFS_BLK_STATE_BITS,
+					      SSDFS_BLK_STATE_MASK,
+					      blks_diff,
+					      BYTE_CONTAINS_STATE,
+					      FIRST_STATE_IN_BYTE,
+					      &found_off);
+		if (err == -ENODATA) {
+			blks_diff = 0;
+			continue;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find block in byte: "
+				  "start_off %u, blk_state %#x, err %d\n",
+				  blks_diff, blk_state, err);
+			return err;
+		}
+
+		*found_blk = first_cached_blk;
+		*found_blk += byte_index * items_per_byte;
+		*found_blk += found_off;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("block %u has been found for state %#x\n",
+			  *found_blk, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		return 0;
+	}
+
+	return -ENODATA;
+}
+
+static inline
+int ssdfs_block_bmap_define_start_item(int page_index,
+					u32 start,
+					u32 aligned_start,
+					u32 aligned_end,
+					u32 *start_byte,
+					u32 *rest_bytes,
+					u8 *item_offset)
+{
+	u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS);
+	u32 items_per_page = PAGE_SIZE * items_per_byte;
+	u32 items;
+	u32 offset;
+
+	if ((page_index * items_per_page) <= aligned_start)
+		offset = aligned_start % items_per_page;
+	else
+		offset = aligned_start;
+
+	*start_byte = offset / items_per_byte;
+
+	items = items_per_page - offset;
+
+	if (aligned_end <= start) {
+		SSDFS_ERR("page_index %d, start %u, "
+			  "aligned_start %u, aligned_end %u, "
+			  "start_byte %u, rest_bytes %u, item_offset %u\n",
+			  page_index, start,
+			  aligned_start, aligned_end,
+			  *start_byte, *rest_bytes, *item_offset);
+		SSDFS_WARN("aligned_end %u <= start %u\n",
+			   aligned_end, start);
+		return -ERANGE;
+	} else
+		items = min_t(u32, items, aligned_end);
+
+	*rest_bytes = items + items_per_byte - 1;
+	*rest_bytes /= items_per_byte;
+
+	*item_offset = (u8)(start - aligned_start);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("page_index %d, start %u, "
+		  "aligned_start %u, aligned_end %u, "
+		  "start_byte %u, rest_bytes %u, item_offset %u\n",
+		  page_index, start,
+		  aligned_start, aligned_end,
+		  *start_byte, *rest_bytes, *item_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_find_block_in_memory_range() - find block in memory range
+ * @kaddr: pointer on memory range
+ * @blk_state: requested state of searching block
+ * @byte_index: index of byte in memory range [in|out]
+ * @search_bytes: upper bound for search
+ * @start_off: starting bit offset in byte
+ * @found_off: pointer on found byte's offset [out]
+ *
+ * This function searches a block with requested @blk_state
+ * into memory range.
+ *
+ * RETURN:
+ * [success] - found byte's offset in @found_off.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENODATA    - block with requested state is not found.
+ */
+static
+int ssdfs_block_bmap_find_block_in_memory_range(void *kaddr,
+						int blk_state,
+						u32 *byte_index,
+						u32 search_bytes,
+						u8 start_off,
+						u8 *found_off)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!kaddr || !byte_index || !found_off);
+
+	if (blk_state > SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_state %#x, byte_index %u, "
+		  "search_bytes %u, start_off %u\n",
+		  blk_state, *byte_index,
+		  search_bytes, start_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (; *byte_index < search_bytes; ++(*byte_index)) {
+		u8 *value = (u8 *)kaddr + *byte_index;
+
+		err = FIND_FIRST_ITEM_IN_BYTE(value, blk_state,
+					      SSDFS_BLK_STATE_BITS,
+					      SSDFS_BLK_STATE_MASK,
+					      start_off,
+					      BYTE_CONTAINS_STATE,
+					      FIRST_STATE_IN_BYTE,
+					      found_off);
+		if (err == -ENODATA) {
+			start_off = 0;
+			continue;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find block in byte: "
+				  "start_off %u, blk_state %#x, "
+				  "err %d\n",
+				  start_off, blk_state, err);
+			return err;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("offset %u has been found for state %#x, "
+			  "err %d\n",
+			  *found_off, blk_state, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		return 0;
+	}
+
+	return -ENODATA;
+}
+
+/*
+ * ssdfs_block_bmap_find_block_in_buffer() - find block in buffer with state
+ * @blk_bmap: pointer on block bitmap
+ * @start: start position for search
+ * @max_blk: upper bound for search
+ * @blk_state: requested state of searching block
+ * @found_blk: pointer on found block number [out]
+ *
+ * This function searches a block with requested @blk_state
+ * from @start till @max_blk (not inclusive) into buffer.
+ * The found block's number is returned via @found_blk.
+ *
+ * RETURN:
+ * [success] - found block number in @found_blk.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENODATA    - block with requested state is not found.
+ */
+static
+int ssdfs_block_bmap_find_block_in_buffer(struct ssdfs_block_bmap *blk_bmap,
+					  u32 start, u32 max_blk,
+					  int blk_state, u32 *found_blk)
+{
+	u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS);
+	u32 aligned_start, aligned_end;
+	u32 byte_index, search_bytes = U32_MAX;
+	u32 rest_bytes = U32_MAX;
+	u8 start_off = U8_MAX;
+	void *kaddr;
+	u8 found_off = U8_MAX;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !found_blk);
+
+	if (blk_state > SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	if (start >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid start block %u\n", start);
+		return -EINVAL;
+	}
+
+	if (start > max_blk) {
+		SSDFS_ERR("start %u > max_blk %u\n", start, max_blk);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, start %u, max_blk %u, "
+		  "state %#x, found_blk %p\n",
+		  blk_bmap, start, max_blk, blk_state, found_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*found_blk = max_blk;
+
+	aligned_start = ALIGNED_START_BLK(start);
+	aligned_end = ALIGNED_END_BLK(max_blk);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("blk_state %#x, start %u, max_blk %u, "
+		  "aligned_start %u, aligned_end %u\n",
+		  blk_state, start, max_blk,
+		  aligned_start, aligned_end);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_block_bmap_define_start_item(0,
+						 start,
+						 aligned_start,
+						 aligned_end,
+						 &byte_index,
+						 &rest_bytes,
+						 &start_off);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to define start item: "
+			  "blk_state %#x, start %u, max_blk %u, "
+			  "aligned_start %u, aligned_end %u\n",
+			  blk_state, start, max_blk,
+			  aligned_start, aligned_end);
+		return err;
+	}
+
+	kaddr = blk_bmap->storage.buf;
+	search_bytes = byte_index + rest_bytes;
+
+	err = ssdfs_block_bmap_find_block_in_memory_range(kaddr, blk_state,
+							  &byte_index,
+							  search_bytes,
+							  start_off,
+							  &found_off);
+	if (err == -ENODATA) {
+		/* no item has been found */
+		return err;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to find block: "
+			  "start_off %u, blk_state %#x, "
+			  "err %d\n",
+			  start_off, blk_state, err);
+		return err;
+	}
+
+	*found_blk = byte_index * items_per_byte;
+	*found_blk += found_off;
+
+	if (*found_blk >= max_blk)
+		err = -ENODATA;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("block %u has been found for state %#x, "
+		  "err %d\n",
+		  *found_blk, blk_state, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_block_bmap_find_block_in_pagevec() - find block in pagevec with state
+ * @blk_bmap: pointer on block bitmap
+ * @start: start position for search
+ * @max_blk: upper bound for search
+ * @blk_state: requested state of searching block
+ * @found_blk: pointer on found block number [out]
+ *
+ * This function searches a block with requested @blk_state
+ * from @start till @max_blk (not inclusive) into pagevec.
+ * The found block's number is returned via @found_blk.
+ *
+ * RETURN:
+ * [success] - found block number in @found_blk.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENODATA    - block with requested state is not found.
+ */
+static
+int ssdfs_block_bmap_find_block_in_pagevec(struct ssdfs_block_bmap *blk_bmap,
+					   u32 start, u32 max_blk,
+					   int blk_state, u32 *found_blk)
+{
+	struct ssdfs_page_vector *array;
+	u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS);
+	size_t items_per_page = PAGE_SIZE * items_per_byte;
+	u32 aligned_start, aligned_end;
+	struct page *page;
+	void *kaddr;
+	int page_index;
+	u8 found_off = U8_MAX;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !found_blk);
+
+	if (blk_state > SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	if (start >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid start block %u\n", start);
+		return -EINVAL;
+	}
+
+	if (start > max_blk) {
+		SSDFS_ERR("start %u > max_blk %u\n", start, max_blk);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, start %u, max_blk %u, "
+		  "state %#x, found_blk %p\n",
+		  blk_bmap, start, max_blk, blk_state, found_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*found_blk = max_blk;
+
+	array = &blk_bmap->storage.array;
+
+	aligned_start = ALIGNED_START_BLK(start);
+	aligned_end = ALIGNED_END_BLK(max_blk);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("blk_state %#x, start %u, max_blk %u, "
+		  "aligned_start %u, aligned_end %u\n",
+		  blk_state, start, max_blk,
+		  aligned_start, aligned_end);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	page_index = aligned_start / items_per_page;
+
+	if (page_index >= ssdfs_page_vector_capacity(array)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page_index %d >= capacity %u\n",
+			  page_index,
+			  ssdfs_page_vector_capacity(array));
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	}
+
+	if (page_index >= ssdfs_page_vector_count(array)) {
+		if (blk_state != SSDFS_BLK_FREE)
+			return -ENODATA;
+
+		while (page_index >= ssdfs_page_vector_count(array)) {
+			page = ssdfs_page_vector_allocate(array);
+			if (IS_ERR_OR_NULL(page)) {
+				err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+				SSDFS_ERR("unable to allocate page\n");
+				return err;
+			}
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("page %p, count %d\n",
+				  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		*found_blk = page_index * items_per_page;
+
+		if (*found_blk >= max_blk)
+			err = -ENODATA;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("block %u has been found for state %#x, "
+			  "err %d\n",
+			  *found_blk, blk_state, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		return err;
+	}
+
+	for (; page_index < ssdfs_page_vector_capacity(array); page_index++) {
+		u32 byte_index, search_bytes = U32_MAX;
+		u32 rest_bytes = U32_MAX;
+		u8 start_off = U8_MAX;
+
+		if (page_index == ssdfs_page_vector_count(array)) {
+			if (blk_state != SSDFS_BLK_FREE)
+				return -ENODATA;
+
+			page = ssdfs_page_vector_allocate(array);
+			if (IS_ERR_OR_NULL(page)) {
+				err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+				SSDFS_ERR("unable to allocate page\n");
+				return err;
+			}
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("page %p, count %d\n",
+				  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			*found_blk = page_index * items_per_page;
+
+			if (*found_blk >= max_blk)
+				err = -ENODATA;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("block %u has been found for state %#x, "
+				  "err %d\n",
+				  *found_blk, blk_state, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			return err;
+		}
+
+		err = ssdfs_block_bmap_define_start_item(page_index, start,
+							 aligned_start,
+							 aligned_end,
+							 &byte_index,
+							 &rest_bytes,
+							 &start_off);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to define start item: "
+				  "blk_state %#x, start %u, max_blk %u, "
+				  "aligned_start %u, aligned_end %u\n",
+				  blk_state, start, max_blk,
+				  aligned_start, aligned_end);
+			return err;
+		}
+
+		search_bytes = byte_index + rest_bytes;
+
+		kaddr = kmap_local_page(array->pages[page_index]);
+		err = ssdfs_block_bmap_find_block_in_memory_range(kaddr,
+								  blk_state,
+								  &byte_index,
+								  search_bytes,
+								  start_off,
+								  &found_off);
+		kunmap_local(kaddr);
+
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("no item has been found: "
+				  "page_index %d, "
+				  "page_vector_count %u, "
+				  "page_vector_capacity %u\n",
+				  page_index,
+				  ssdfs_page_vector_count(array),
+				  ssdfs_page_vector_capacity(array));
+#endif /* CONFIG_SSDFS_DEBUG */
+			continue;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find block: "
+				  "start_off %u, blk_state %#x, "
+				  "err %d\n",
+				  start_off, blk_state, err);
+			return err;
+		}
+
+		*found_blk = page_index * items_per_page;
+		*found_blk += byte_index * items_per_byte;
+		*found_blk += found_off;
+
+		if (*found_blk >= max_blk)
+			err = -ENODATA;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("block %u has been found for state %#x, "
+			  "err %d\n",
+			  *found_blk, blk_state, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		return err;
+	}
+
+	return -ENODATA;
+}
+
+/*
+ * ssdfs_block_bmap_find_block_in_storage() - find block in storage with state
+ * @blk_bmap: pointer on block bitmap
+ * @start: start position for search
+ * @max_blk: upper bound for search
+ * @blk_state: requested state of searching block
+ * @found_blk: pointer on found block number [out]
+ *
+ * This function searches a block with requested @blk_state
+ * from @start till @max_blk (not inclusive) into storage.
+ * The found block's number is returned via @found_blk.
+ *
+ * RETURN:
+ * [success] - found block number in @found_blk.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENODATA    - block with requested state is not found.
+ */
+static
+int ssdfs_block_bmap_find_block_in_storage(struct ssdfs_block_bmap *blk_bmap,
+					   u32 start, u32 max_blk,
+					   int blk_state, u32 *found_blk)
+{
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !found_blk);
+
+	if (blk_state > SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	if (start >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid start block %u\n", start);
+		return -EINVAL;
+	}
+
+	if (start > max_blk) {
+		SSDFS_ERR("start %u > max_blk %u\n", start, max_blk);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, start %u, max_blk %u, "
+		  "state %#x, found_blk %p\n",
+		  blk_bmap, start, max_blk, blk_state, found_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (blk_bmap->storage.state) {
+	case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC:
+		err = ssdfs_block_bmap_find_block_in_pagevec(blk_bmap,
+							     start,
+							     max_blk,
+							     blk_state,
+							     found_blk);
+		break;
+
+	case SSDFS_BLOCK_BMAP_STORAGE_BUFFER:
+		err = ssdfs_block_bmap_find_block_in_buffer(blk_bmap,
+							    start,
+							    max_blk,
+							    blk_state,
+							    found_blk);
+		break;
+
+	default:
+		SSDFS_ERR("unexpected state %#x\n",
+				blk_bmap->storage.state);
+		return -ERANGE;
+	}
+
+	return err;
+}
+
+/*
+ * ssdfs_block_bmap_find_block() - find block with requested state
+ * @blk_bmap: pointer on block bitmap
+ * @start: start position for search
+ * @max_blk: upper bound for search
+ * @blk_state: requested state of searching block
+ * @found_blk: pointer on found block number [out]
+ *
+ * This function searches a block with requested @blk_state
+ * from @start till @max_blk (not inclusive). The found block's
+ * number is returned via @found_blk. If @blk_state has
+ * SSDFS_BLK_STATE_MAX then it needs to get block state
+ * for @start block number, simply.
+ *
+ * RETURN:
+ * [success] - found block number in @found_blk.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENODATA    - block with requested state is not found.
+ */
+static
+int ssdfs_block_bmap_find_block(struct ssdfs_block_bmap *blk_bmap,
+				u32 start, u32 max_blk, int blk_state,
+				u32 *found_blk)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !found_blk);
+
+	if (blk_state > SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	if (start >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid start block %u\n", start);
+		return -EINVAL;
+	}
+
+	if (start > max_blk) {
+		SSDFS_ERR("start %u > max_blk %u\n", start, max_blk);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, start %u, max_blk %u, "
+		  "state %#x, found_blk %p\n",
+		  blk_bmap, start, max_blk, blk_state, found_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (blk_state == SSDFS_BLK_STATE_MAX) {
+		err = ssdfs_cache_block_state(blk_bmap, start, blk_state);
+		if (unlikely(err)) {
+			SSDFS_ERR("unable to cache block %u state: err %d\n",
+				  start, err);
+			return err;
+		}
+
+		*found_blk = start;
+		return 0;
+	}
+
+	*found_blk = max_blk;
+	max_blk = min_t(u32, max_blk, blk_bmap->items_count);
+
+	if (is_cache_invalid(blk_bmap, blk_state)) {
+		err = ssdfs_block_bmap_find_block_in_storage(blk_bmap,
+							     start, max_blk,
+							     blk_state,
+							     found_blk);
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find block in pagevec: "
+				  "start %u, max_blk %u, state %#x\n",
+				  0, max_blk, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return err;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find block in pagevec: "
+				  "start %u, max_blk %u, state %#x, err %d\n",
+				  0, max_blk, blk_state, err);
+			goto fail_find;
+		}
+
+		err = ssdfs_cache_block_state(blk_bmap, *found_blk, blk_state);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to cache block: "
+				  "found_blk %u, state %#x, err %d\n",
+				  *found_blk, blk_state, err);
+			goto fail_find;
+		}
+	}
+
+	if (*found_blk >= start && *found_blk < max_blk)
+		goto end_search;
+
+	if (is_block_state_cached(blk_bmap, start)) {
+		err = ssdfs_block_bmap_find_block_in_cache(blk_bmap,
+							   start, max_blk,
+							   blk_state,
+							   found_blk);
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find block in cache: "
+				  "start %u, max_blk %u, state %#x\n",
+				  start, max_blk, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+			/*
+			 * Continue to search in pagevec
+			 */
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find block in cache: "
+				  "start %u, max_blk %u, state %#x, err %d\n",
+				  start, max_blk, blk_state, err);
+			goto fail_find;
+		} else if (*found_blk >= start && *found_blk < max_blk)
+			goto end_search;
+	}
+
+	err = ssdfs_block_bmap_find_block_in_storage(blk_bmap, start, max_blk,
+						     blk_state, found_blk);
+	if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to find block in pagevec: "
+			  "start %u, max_blk %u, state %#x\n",
+			  start, max_blk, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return err;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to find block in pagevec: "
+			  "start %u, max_blk %u, state %#x, err %d\n",
+			  start, max_blk, blk_state, err);
+		goto fail_find;
+	}
+
+	switch (SSDFS_GET_CACHE_TYPE(blk_state)) {
+	case SSDFS_FREE_BLK_SEARCH:
+	case SSDFS_OTHER_BLK_SEARCH:
+		err = ssdfs_cache_block_state(blk_bmap, *found_blk, blk_state);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to cache block: "
+				  "found_blk %u, state %#x, err %d\n",
+				  *found_blk, blk_state, err);
+			goto fail_find;
+		}
+		break;
+
+	default:
+		/* do nothing */
+		break;
+	}
+
+end_search:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("block %u has been found for state %#x\n",
+		  *found_blk, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+
+fail_find:
+	return err;
+}
+
+/*
+ * BYTE_CONTAIN_DIVERSE_STATES() - check that byte contains diverse state
+ * @value: pointer on analysed byte
+ * @blk_state: requested block's state
+ *
+ * RETURN:
+ * [true]  - @value contains diverse states.
+ * [false] - @value contains @blk_state only.
+ */
+static inline
+bool BYTE_CONTAIN_DIVERSE_STATES(u8 *value, int blk_state)
+{
+	switch (blk_state) {
+	case SSDFS_BLK_FREE:
+		return *value != SSDFS_FREE_STATES_BYTE;
+
+	case SSDFS_BLK_PRE_ALLOCATED:
+		return *value != SSDFS_PRE_ALLOC_STATES_BYTE;
+
+	case SSDFS_BLK_VALID:
+		return *value != SSDFS_VALID_STATES_BYTE;
+
+	case SSDFS_BLK_INVALID:
+		return *value != SSDFS_INVALID_STATES_BYTE;
+	};
+
+	return false;
+}
+
+/*
+ * GET_FIRST_DIFF_STATE() - determine first block offset for different state
+ * @value: pointer on analysed byte
+ * @blk_state: requested block's state
+ * @start_off: starting block offset for analysis beginning
+ *
+ * This function tries to determine an item with different that @blk_state in
+ * @value starting from @start_off.
+ *
+ * RETURN:
+ * [success] - found block offset.
+ * [failure] - BITS_PER_BYTE.
+ */
+static inline
+u8 GET_FIRST_DIFF_STATE(u8 *value, int blk_state, u8 start_off)
+{
+	u8 i;
+	u8 bits_off;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!value);
+	BUG_ON(start_off >= (BITS_PER_BYTE / SSDFS_BLK_STATE_BITS));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	bits_off = start_off * SSDFS_BLK_STATE_BITS;
+
+	for (i = bits_off; i < BITS_PER_BYTE; i += SSDFS_BLK_STATE_BITS) {
+		if (((*value >> i) & SSDFS_BLK_STATE_MASK) != blk_state) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("blk_state %#x, start_off %u, blk_off %u\n",
+				  blk_state, start_off, i);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return i / SSDFS_BLK_STATE_BITS;
+		}
+	}
+
+	return BITS_PER_BYTE;
+}
+
+/*
+ * ssdfs_find_state_area_end_in_byte() - find end block for state area in byte
+ * @value: pointer on analysed byte
+ * @blk_state: requested block's state
+ * @start_off: starting block offset for search
+ * @found_off: pointer on found end block [out]
+ *
+ * RETURN:
+ * [success] - @found_off contains found end offset.
+ * [failure] - error code:
+ *
+ * %-ENODATA    - analyzed @value contains @blk_state only.
+ */
+static inline
+int ssdfs_find_state_area_end_in_byte(u8 *value, int blk_state,
+					u8 start_off, u8 *found_off)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("value %p, blk_state %#x, "
+		  "start_off %u, found_off %p\n",
+		  value, blk_state, start_off, found_off);
+
+	BUG_ON(!value || !found_off);
+	BUG_ON(start_off >= (BITS_PER_BYTE / SSDFS_BLK_STATE_BITS));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*found_off = BITS_PER_BYTE;
+
+	if (BYTE_CONTAIN_DIVERSE_STATES(value, blk_state)) {
+		u8 blk_offset = GET_FIRST_DIFF_STATE(value, blk_state,
+							start_off);
+
+		if (blk_offset < BITS_PER_BYTE) {
+			*found_off = blk_offset;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("block offset %u for *NOT* state %#x\n",
+				  *found_off, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			return 0;
+		}
+	}
+
+	return -ENODATA;
+}
+
+/*
+ * ssdfs_block_bmap_find_state_area_end_in_memory() - find state area end
+ * @kaddr: pointer on memory range
+ * @blk_state: requested state of searching block
+ * @byte_index: index of byte in memory range [in|out]
+ * @search_bytes: upper bound for search
+ * @start_off: starting bit offset in byte
+ * @found_off: pointer on found end block [out]
+ *
+ * This function tries to find @blk_state area end
+ * in range [@start, @max_blk).
+ *
+ * RETURN:
+ * [success] - found byte's offset in @found_off.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENODATA    - nothing has been found.
+ */
+static
+int ssdfs_block_bmap_find_state_area_end_in_memory(void *kaddr,
+						   int blk_state,
+						   u32 *byte_index,
+						   u32 search_bytes,
+						   u8 start_off,
+						   u8 *found_off)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!kaddr || !byte_index || !found_off);
+
+	if (blk_state > SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (; *byte_index < search_bytes; ++(*byte_index)) {
+		u8 *value = (u8 *)kaddr + *byte_index;
+
+		err = ssdfs_find_state_area_end_in_byte(value,
+							blk_state,
+							start_off,
+							found_off);
+		if (err == -ENODATA) {
+			start_off = 0;
+			continue;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find state area's end: "
+				  "start_off %u, blk_state %#x, "
+				  "err %d\n",
+				  start_off, blk_state, err);
+			return err;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("offset %u has been found for state %#x, "
+			  "err %d\n",
+			  *found_off, blk_state, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		return 0;
+	}
+
+	return -ENODATA;
+}
+
+/*
+ * ssdfs_block_bmap_find_state_area_end_in_buffer() - find state area end
+ * @bmap: pointer on block bitmap
+ * @start: start position for search
+ * @max_blk: upper bound for search
+ * @blk_state: area state
+ * @found_end: pointer on found end block [out]
+ *
+ * This function tries to find @blk_state area end
+ * in range [@start, @max_blk).
+ *
+ * RETURN:
+ * [success] - @found_end contains found end block.
+ * [failure] - items count in block bitmap or error:
+ *
+ * %-EINVAL     - invalid input value.
+ */
+static int
+ssdfs_block_bmap_find_state_area_end_in_buffer(struct ssdfs_block_bmap *bmap,
+						u32 start, u32 max_blk,
+						int blk_state, u32 *found_end)
+{
+	u32 aligned_start, aligned_end;
+	u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS);
+	u32 byte_index, search_bytes = U32_MAX;
+	u32 rest_bytes = U32_MAX;
+	u8 start_off = U8_MAX;
+	void *kaddr;
+	u8 found_off = U8_MAX;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("start %u, max_blk %u, blk_state %#x\n",
+		  start, max_blk, blk_state);
+
+	BUG_ON(!bmap || !found_end);
+
+	if (start >= bmap->items_count) {
+		SSDFS_ERR("invalid start block %u\n", start);
+		return -EINVAL;
+	}
+
+	if (start > max_blk) {
+		SSDFS_ERR("start %u > max_blk %u\n", start, max_blk);
+		return -EINVAL;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*found_end = U32_MAX;
+
+	aligned_start = ALIGNED_START_BLK(start);
+	aligned_end = ALIGNED_END_BLK(max_blk);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("blk_state %#x, start %u, max_blk %u, "
+		  "aligned_start %u, aligned_end %u\n",
+		  blk_state, start, max_blk,
+		  aligned_start, aligned_end);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_block_bmap_define_start_item(0,
+						 start,
+						 aligned_start,
+						 aligned_end,
+						 &byte_index,
+						 &rest_bytes,
+						 &start_off);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to define start item: "
+			  "blk_state %#x, start %u, max_blk %u, "
+			  "aligned_start %u, aligned_end %u\n",
+			  blk_state, start, max_blk,
+			  aligned_start, aligned_end);
+		return err;
+	}
+
+	kaddr = bmap->storage.buf;
+	search_bytes = byte_index + rest_bytes;
+
+	err = ssdfs_block_bmap_find_state_area_end_in_memory(kaddr, blk_state,
+							     &byte_index,
+							     search_bytes,
+							     start_off,
+							     &found_off);
+	if (err == -ENODATA) {
+		*found_end = max_blk;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("area end %u has been found for state %#x\n",
+			  *found_end, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return 0;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to find state area's end: "
+			  "start_off %u, blk_state %#x, "
+			  "err %d\n",
+			  start_off, blk_state, err);
+		return err;
+	}
+
+	*found_end = byte_index * items_per_byte;
+	*found_end += found_off;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("start %u, aligned_start %u, "
+		  "aligned_end %u, byte_index %u, "
+		  "items_per_byte %u, start_off %u, "
+		  "found_off %u\n",
+		  start, aligned_start, aligned_end, byte_index,
+		  items_per_byte, start_off, found_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (*found_end > max_blk)
+		*found_end = max_blk;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("area end %u has been found for state %#x\n",
+		  *found_end, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_find_state_area_end_in_pagevec() - find state area end
+ * @bmap: pointer on block bitmap
+ * @start: start position for search
+ * @max_blk: upper bound for search
+ * @blk_state: area state
+ * @found_end: pointer on found end block [out]
+ *
+ * This function tries to find @blk_state area end
+ * in range [@start, @max_blk).
+ *
+ * RETURN:
+ * [success] - @found_end contains found end block.
+ * [failure] - items count in block bitmap or error:
+ *
+ * %-EINVAL     - invalid input value.
+ */
+static int
+ssdfs_block_bmap_find_state_area_end_in_pagevec(struct ssdfs_block_bmap *bmap,
+						u32 start, u32 max_blk,
+						int blk_state, u32 *found_end)
+{
+	struct ssdfs_page_vector *array;
+	u32 aligned_start, aligned_end;
+	u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS);
+	size_t items_per_page = PAGE_SIZE * items_per_byte;
+	void *kaddr;
+	int page_index;
+	u8 found_off = U8_MAX;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("start %u, max_blk %u, blk_state %#x\n",
+		  start, max_blk, blk_state);
+
+	BUG_ON(!bmap || !found_end);
+
+	if (start >= bmap->items_count) {
+		SSDFS_ERR("invalid start block %u\n", start);
+		return -EINVAL;
+	}
+
+	if (start > max_blk) {
+		SSDFS_ERR("start %u > max_blk %u\n", start, max_blk);
+		return -EINVAL;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*found_end = U32_MAX;
+
+	array = &bmap->storage.array;
+
+	aligned_start = ALIGNED_START_BLK(start);
+	aligned_end = ALIGNED_END_BLK(max_blk);
+
+	page_index = aligned_start / items_per_page;
+
+	if (page_index >= ssdfs_page_vector_count(array)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page_index %d >= count %u\n",
+			  page_index,
+			  ssdfs_page_vector_count(array));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		*found_end = max_blk;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("area end %u has been found for state %#x\n",
+			  *found_end, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		return 0;
+	}
+
+	for (; page_index < ssdfs_page_vector_count(array); page_index++) {
+		u32 byte_index, search_bytes = U32_MAX;
+		u32 rest_bytes = U32_MAX;
+		u8 start_off = U8_MAX;
+
+		err = ssdfs_block_bmap_define_start_item(page_index, start,
+							 aligned_start,
+							 aligned_end,
+							 &byte_index,
+							 &rest_bytes,
+							 &start_off);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to define start item: "
+				  "blk_state %#x, start %u, max_blk %u, "
+				  "aligned_start %u, aligned_end %u\n",
+				  blk_state, start, max_blk,
+				  aligned_start, aligned_end);
+			return err;
+		}
+
+		search_bytes = byte_index + rest_bytes;
+
+		kaddr = kmap_local_page(array->pages[page_index]);
+		err = ssdfs_block_bmap_find_state_area_end_in_memory(kaddr,
+								blk_state,
+								&byte_index,
+								search_bytes,
+								start_off,
+								&found_off);
+		kunmap_local(kaddr);
+
+		if (err == -ENODATA) {
+			/* nothing has been found */
+			continue;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find state area's end: "
+				  "start_off %u, blk_state %#x, "
+				  "err %d\n",
+				  start_off, blk_state, err);
+			return err;
+		}
+
+		*found_end = page_index * items_per_page;
+		*found_end += byte_index * items_per_byte;
+		*found_end += found_off;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("start %u, aligned_start %u, "
+			  "aligned_end %u, "
+			  "page_index %d, items_per_page %zu, "
+			  "byte_index %u, "
+			  "items_per_byte %u, start_off %u, "
+			  "found_off %u\n",
+			  start, aligned_start, aligned_end,
+			  page_index, items_per_page, byte_index,
+			  items_per_byte, start_off, found_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (*found_end > max_blk)
+			*found_end = max_blk;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("area end %u has been found for state %#x\n",
+			  *found_end, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		return 0;
+	}
+
+	*found_end = max_blk;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("area end %u has been found for state %#x\n",
+		  *found_end, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_find_state_area_end() - find state area end
+ * @blk_bmap: pointer on block bitmap
+ * @start: start position for search
+ * @max_blk: upper bound for search
+ * @blk_state: area state
+ * @found_end: pointer on found end block [out]
+ *
+ * This function tries to find @blk_state area end
+ * in range [@start, @max_blk).
+ *
+ * RETURN:
+ * [success] - @found_end contains found end block.
+ * [failure] - items count in block bitmap or error:
+ *
+ * %-EINVAL     - invalid input value.
+ */
+static
+int ssdfs_block_bmap_find_state_area_end(struct ssdfs_block_bmap *blk_bmap,
+					 u32 start, u32 max_blk, int blk_state,
+					 u32 *found_end)
+{
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("start %u, max_blk %u, blk_state %#x\n",
+		  start, max_blk, blk_state);
+
+	BUG_ON(!blk_bmap || !found_end);
+
+	if (start >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid start block %u\n", start);
+		return -EINVAL;
+	}
+
+	if (start > max_blk) {
+		SSDFS_ERR("start %u > max_blk %u\n", start, max_blk);
+		return -EINVAL;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (blk_state == SSDFS_BLK_FREE) {
+		*found_end = blk_bmap->items_count;
+		return 0;
+	}
+
+	switch (blk_bmap->storage.state) {
+	case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC:
+		err = ssdfs_block_bmap_find_state_area_end_in_pagevec(blk_bmap,
+								     start,
+								     max_blk,
+								     blk_state,
+								     found_end);
+		break;
+
+	case SSDFS_BLOCK_BMAP_STORAGE_BUFFER:
+		err = ssdfs_block_bmap_find_state_area_end_in_buffer(blk_bmap,
+								     start,
+								     max_blk,
+								     blk_state,
+								     found_end);
+		break;
+
+	default:
+		SSDFS_ERR("unexpected state %#x\n",
+				blk_bmap->storage.state);
+		return -ERANGE;
+	}
+
+	return err;
+}
+
+/*
+ * range_corrupted() - check that range is corrupted
+ * @blk_bmap: pointer on block bitmap
+ * @range: range for check
+ *
+ * RETURN:
+ * [true]  - range is invalid
+ * [false] - range is valid
+ */
+static inline
+bool range_corrupted(struct ssdfs_block_bmap *blk_bmap,
+		     struct ssdfs_block_bmap_range *range)
+{
+	if (range->len > blk_bmap->items_count)
+		return true;
+	if (range->start > (blk_bmap->items_count - range->len))
+		return true;
+	return false;
+}
+
+/*
+ * is_whole_range_cached() - check that cache contains requested range
+ * @blk_bmap: pointer on block bitmap
+ * @range: range for check
+ *
+ * RETURN:
+ * [true]  - cache contains the whole range
+ * [false] - cache doesn't include the whole range
+ */
+static
+bool is_whole_range_cached(struct ssdfs_block_bmap *blk_bmap,
+			   struct ssdfs_block_bmap_range *range)
+{
+	struct ssdfs_block_bmap_range cached_range;
+	int i;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !range);
+
+	if (range_corrupted(blk_bmap, range)) {
+		SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n",
+			  range->start, range->len,
+			  blk_bmap->items_count);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, range (start %u, len %u)\n",
+		  blk_bmap, range->start, range->len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) {
+		struct ssdfs_last_bmap_search *last_search;
+		int cmp;
+
+		last_search = &blk_bmap->last_search[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("last_search.cache %lx\n", last_search->cache);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		cached_range.start = SSDFS_FIRST_CACHED_BLOCK(last_search);
+		cached_range.len = SSDFS_ITEMS_PER_LONG(SSDFS_BLK_STATE_BITS);
+
+		cmp = compare_block_bmap_ranges(&cached_range, range);
+
+		if (cmp >= 0)
+			return true;
+		else if (ranges_have_intersection(&cached_range, range))
+			return false;
+	}
+
+	return false;
+}
+
+/*
+ * ssdfs_set_range_in_cache() - set small range in cache
+ * @blk_bmap: pointer on block bitmap
+ * @range: requested range
+ * @blk_state: state for set
+ *
+ * This function sets small range in cache.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ */
+static
+int ssdfs_set_range_in_cache(struct ssdfs_block_bmap *blk_bmap,
+				struct ssdfs_block_bmap_range *range,
+				int blk_state)
+{
+	u32 blk, index;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	if (blk_state >= SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	if (range_corrupted(blk_bmap, range)) {
+		SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n",
+			  range->start, range->len,
+			  blk_bmap->items_count);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, range (start %u, len %u), state %#x\n",
+		  blk_bmap, range->start, range->len, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (index = 0; index < range->len; index++) {
+		blk = range->start + index;
+		err = ssdfs_set_block_state_in_cache(blk_bmap, blk, blk_state);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to set block %u in cache: err %d\n",
+				  blk, err);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_set_uncached_tiny_range() - set tiny uncached range by state
+ * @blk_bmap: pointer on block bitmap
+ * @range: range for set
+ * @blk_state: state for set
+ *
+ * This function caches @range, to set @range in cache by @blk_state
+ * and to save the cache in pagevec.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ */
+static
+int ssdfs_set_uncached_tiny_range(struct ssdfs_block_bmap *blk_bmap,
+				  struct ssdfs_block_bmap_range *range,
+				  int blk_state)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !range);
+
+	if (blk_state >= SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	if (range_corrupted(blk_bmap, range)) {
+		SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n",
+			  range->start, range->len,
+			  blk_bmap->items_count);
+		return -EINVAL;
+	}
+
+	if (range->len > SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS)) {
+		SSDFS_ERR("range (start %u, len %u) is not tiny\n",
+			  range->start, range->len);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, range (start %u, len %u), state %#x\n",
+		  blk_bmap, range->start, range->len, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_cache_block_state(blk_bmap, range->start, blk_state);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to cache block %u: err %d\n",
+			  range->start, err);
+		return err;
+	}
+
+	err = ssdfs_set_range_in_cache(blk_bmap, range, blk_state);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to set (start %u, len %u): err %d\n",
+			  range->start, range->len, err);
+		return err;
+	}
+
+	err = ssdfs_save_cache_in_storage(blk_bmap);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to save cache in pagevec: err %d\n",
+			  err);
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * __ssdfs_set_range_in_memory() - set range of bits in memory
+ * @blk_bmap: pointer on block bitmap
+ * @page_index: index of memory page
+ * @byte_offset: offset in bytes from the page's beginning
+ * @byte_value: byte value for setting
+ * @init_size: size in bytes for setting
+ *
+ * This function sets range of bits in memory.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ERANGE     - internal error.
+ */
+static
+int __ssdfs_set_range_in_memory(struct ssdfs_block_bmap *blk_bmap,
+				int page_index, u32 byte_offset,
+				int byte_value, size_t init_size)
+{
+	struct ssdfs_page_vector *array;
+	void *kaddr;
+	int max_capacity = SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	SSDFS_DBG("blk_bmap %p, page_index %d, byte_offset %u, "
+		  "byte_value %#x, init_size %zu\n",
+		  blk_bmap, page_index, byte_offset,
+		  byte_value, init_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (blk_bmap->storage.state) {
+	case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC:
+		array = &blk_bmap->storage.array;
+
+		if (page_index >= ssdfs_page_vector_count(array)) {
+			SSDFS_ERR("invalid page index %d, pagevec size %d\n",
+				  page_index,
+				  ssdfs_page_vector_count(array));
+			return -EINVAL;
+		}
+
+		if (page_index >= ssdfs_page_vector_capacity(array)) {
+			SSDFS_ERR("invalid page index %d, pagevec capacity %d\n",
+				  page_index,
+				  ssdfs_page_vector_capacity(array));
+			return -EINVAL;
+		}
+
+		while (page_index >= ssdfs_page_vector_count(array)) {
+			struct page *page;
+
+			page = ssdfs_page_vector_allocate(array);
+			if (IS_ERR_OR_NULL(page)) {
+				err = (page == NULL ? -ENOMEM : PTR_ERR(page));
+				SSDFS_ERR("unable to allocate page\n");
+				return err;
+			}
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("page %p, count %d\n",
+				  page, page_ref_count(page));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		if ((byte_offset + init_size) > PAGE_SIZE) {
+			SSDFS_WARN("invalid offset: "
+				   "byte_offset %u, init_size %zu\n",
+				   byte_offset, init_size);
+			return -ERANGE;
+		}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_memset_page(array->pages[page_index],
+				  byte_offset, PAGE_SIZE,
+				  byte_value, init_size);
+		break;
+
+	case SSDFS_BLOCK_BMAP_STORAGE_BUFFER:
+		if (page_index != 0) {
+			SSDFS_ERR("invalid page index %d\n",
+				  page_index);
+			return -EINVAL;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		if ((byte_offset + init_size) > blk_bmap->bytes_count) {
+			SSDFS_WARN("invalid offset: "
+				   "byte_offset %u, init_size %zu\n",
+				   byte_offset, init_size);
+			return -ERANGE;
+		}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		kaddr = blk_bmap->storage.buf;
+		memset((u8 *)kaddr + byte_offset, byte_value, init_size);
+		break;
+
+	default:
+		SSDFS_ERR("unexpected state %#x\n",
+			  blk_bmap->storage.state);
+		return -ERANGE;
+	}
+
+	for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) {
+		blk_bmap->last_search[i].page_index = max_capacity;
+		blk_bmap->last_search[i].offset = U16_MAX;
+		blk_bmap->last_search[i].cache = 0;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_set_range_in_storage() - set range in storage by state
+ * @blk_bmap: pointer on block bitmap
+ * @range: range for set
+ * @blk_state: state for set
+ *
+ * This function sets @range in storage by @blk_state.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ */
+static
+int ssdfs_set_range_in_storage(struct ssdfs_block_bmap *blk_bmap,
+				struct ssdfs_block_bmap_range *range,
+				int blk_state)
+{
+	u32 aligned_start, aligned_end;
+	size_t items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS);
+	int byte_value;
+	size_t rest_items, items_per_page;
+	u32 blk;
+	int page_index;
+	u32 item_offset, byte_offset;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !range);
+
+	if (blk_state >= SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	if (range_corrupted(blk_bmap, range)) {
+		SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n",
+			  range->start, range->len,
+			  blk_bmap->items_count);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, range (start %u, len %u), state %#x\n",
+		  blk_bmap, range->start, range->len, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	aligned_start = range->start + items_per_byte - 1;
+	aligned_start >>= SSDFS_BLK_STATE_BITS;
+	aligned_start <<= SSDFS_BLK_STATE_BITS;
+
+	aligned_end = range->start + range->len;
+	aligned_end >>= SSDFS_BLK_STATE_BITS;
+	aligned_end <<= SSDFS_BLK_STATE_BITS;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("aligned_start %u, aligned_end %u\n",
+		  aligned_start, aligned_end);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (range->start != aligned_start) {
+		struct ssdfs_block_bmap_range unaligned;
+
+		unaligned.start = range->start;
+		unaligned.len = aligned_start - range->start;
+
+		err = ssdfs_set_uncached_tiny_range(blk_bmap, &unaligned,
+						    blk_state);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to set (start %u, len %u): err %d\n",
+				  unaligned.start, unaligned.len, err);
+			return err;
+		}
+	}
+
+	byte_value = SSDFS_BLK_BMAP_BYTE(blk_state);
+	items_per_page = PAGE_SIZE * items_per_byte;
+	rest_items = aligned_end - aligned_start;
+	page_index = aligned_start / items_per_page;
+	item_offset = aligned_start % items_per_page;
+	byte_offset = item_offset / items_per_byte;
+
+	blk = aligned_start;
+	while (blk < aligned_end) {
+		size_t iter_items, init_size;
+
+		if (rest_items == 0) {
+			SSDFS_WARN("unexpected items absence: blk %u\n",
+				   blk);
+			break;
+		}
+
+		if (byte_offset >= PAGE_SIZE) {
+			SSDFS_ERR("invalid byte offset %u\n", byte_offset);
+			return -EINVAL;
+		}
+
+		iter_items = items_per_page - item_offset;
+		iter_items = min_t(size_t, iter_items, rest_items);
+		if (iter_items < items_per_page) {
+			init_size = iter_items + items_per_byte - 1;
+			init_size /= items_per_byte;
+		} else
+			init_size = PAGE_SIZE;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("items_per_page %zu, item_offset %u, "
+			  "rest_items %zu, iter_items %zu, "
+			  "init_size %zu\n",
+			  items_per_page, item_offset,
+			  rest_items, iter_items,
+			  init_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = __ssdfs_set_range_in_memory(blk_bmap, page_index,
+						  byte_offset, byte_value,
+						  init_size);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to set range in memory: "
+				  "page_index %d, byte_offset %u, "
+				  "byte_value %#x, init_size %zu, "
+				  "err %d\n",
+				  page_index, byte_offset,
+				  byte_value, init_size,
+				  err);
+			return err;
+		}
+
+		item_offset = 0;
+		byte_offset = 0;
+		page_index++;
+		blk += iter_items;
+		rest_items -= iter_items;
+	};
+
+	if (aligned_end != range->start + range->len) {
+		struct ssdfs_block_bmap_range unaligned;
+
+		unaligned.start = aligned_end;
+		unaligned.len = (range->start + range->len) - aligned_end;
+
+		err = ssdfs_set_uncached_tiny_range(blk_bmap, &unaligned,
+						    blk_state);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to set (start %u, len %u): err %d\n",
+				  unaligned.start, unaligned.len, err);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_find_range() - find range of block of requested state
+ * @blk_bmap: pointer on block bitmap
+ * @start: start block for search
+ * @len: requested length of range
+ * @max_blk: upper bound for search
+ * @blk_state: requested state of blocks in range
+ * @range: found range [out]
+ *
+ * This function searches @range of blocks with requested
+ * @blk_state. If @blk_state has SSDFS_BLK_STATE_MAX value
+ * then it needs to get a continuous @range of blocks
+ * for detecting state of @range is began from @start
+ * block.
+ *
+ * RETURN:
+ * [success] - @range of found blocks.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ */
+static
+int ssdfs_block_bmap_find_range(struct ssdfs_block_bmap *blk_bmap,
+				u32 start, u32 len, u32 max_blk,
+				int blk_state,
+				struct ssdfs_block_bmap_range *range)
+{
+	u32 found_start, found_end;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !range);
+
+	if (blk_state > SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	if (start >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid start block %u\n", start);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, start %u, len %u, max_blk %u, "
+		  "state %#x, range %p\n",
+		  blk_bmap, start, len, max_blk, blk_state, range);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	range->start = U32_MAX;
+	range->len = 0;
+
+	if (start >= max_blk) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("start %u >= max_blk %u\n", start, max_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	}
+
+	err = ssdfs_block_bmap_find_block(blk_bmap, start, max_blk,
+					  blk_state, &found_start);
+	if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to find block: "
+			  "start %u, max_blk %u, state %#x, err %d\n",
+			  start, max_blk, blk_state, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return err;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to find block: "
+			  "start %u, max_blk %u, state %#x, err %d\n",
+			  start, max_blk, blk_state, err);
+		return err;
+	}
+
+	if (found_start >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid found start %u, items count %zu\n",
+			  found_start, blk_bmap->items_count);
+		return -EINVAL;
+	}
+
+	err = ssdfs_block_bmap_find_state_area_end(blk_bmap, found_start,
+						   found_start + len,
+						   blk_state,
+						   &found_end);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to find block: "
+			  "start %u, max_blk %u, state %#x, err %d\n",
+			  start, max_blk, blk_state, err);
+		return err;
+	}
+
+	if (found_end <= found_start) {
+		SSDFS_ERR("invalid found (start %u, end %u), items count %zu\n",
+			  found_start, found_end, blk_bmap->items_count);
+		return -EINVAL;
+	}
+
+	if (found_end > blk_bmap->items_count)
+		found_end = blk_bmap->items_count;
+
+	range->start = found_start;
+	range->len = min_t(u32, len, found_end - found_start);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("found_start %u, found_end %u, len %u, "
+		  "range (start %u, len %u)\n",
+		  found_start, found_end, len,
+		  range->start, range->len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_set_block_state() - set state of block
+ * @blk_bmap: pointer on block bitmap
+ * @blk: segment's block
+ * @blk_state: state for set
+ *
+ * This function sets @blk by @blk_state.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ */
+static
+int ssdfs_block_bmap_set_block_state(struct ssdfs_block_bmap *blk_bmap,
+					u32 blk, int blk_state)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	if (blk_state >= SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	if (blk >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid block %u\n", blk);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, block %u, state %#x\n",
+		  blk_bmap, blk, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_block_state_cached(blk_bmap, blk)) {
+		err = ssdfs_cache_block_state(blk_bmap, blk, blk_state);
+		if (unlikely(err)) {
+			SSDFS_ERR("unable to cache block %u state: err %d\n",
+				  blk, err);
+			return err;
+		}
+	}
+
+	err = ssdfs_set_block_state_in_cache(blk_bmap, blk, blk_state);
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to set block %u state in cache: err %d\n",
+			  blk, err);
+		return err;
+	}
+
+	err = ssdfs_save_cache_in_storage(blk_bmap);
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to save the cache in storage: err %d\n",
+			  err);
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_set_range() - set state of blocks' range
+ * @blk_bmap: pointer on block bitmap
+ * @range: requested range
+ * @blk_state: state for set
+ *
+ * This function sets blocks' @range by @blk_state.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ */
+static
+int ssdfs_block_bmap_set_range(struct ssdfs_block_bmap *blk_bmap,
+				struct ssdfs_block_bmap_range *range,
+				int blk_state)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !range);
+
+	if (blk_state >= SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return -EINVAL;
+	}
+
+	if (range_corrupted(blk_bmap, range)) {
+		SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n",
+			  range->start, range->len,
+			  blk_bmap->items_count);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, range (start %u, len %u), state %#x\n",
+		  blk_bmap, range->start, range->len, blk_state);
+
+	ssdfs_debug_block_bitmap(blk_bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (range->len == 1) {
+		err = ssdfs_block_bmap_set_block_state(blk_bmap, range->start,
+							blk_state);
+		if (err) {
+			SSDFS_ERR("fail to set (start %u, len %u) state %#x: "
+				  "err %d\n",
+				  range->start, range->len, blk_state, err);
+			return err;
+		}
+	} else if (is_whole_range_cached(blk_bmap, range)) {
+		err = ssdfs_set_range_in_cache(blk_bmap, range, blk_state);
+		if (unlikely(err)) {
+			SSDFS_ERR("unable to set (start %u, len %u) state %#x "
+				  "in cache: err %d\n",
+				  range->start, range->len, blk_state, err);
+			return err;
+		}
+
+		err = ssdfs_save_cache_in_storage(blk_bmap);
+		if (unlikely(err)) {
+			SSDFS_ERR("unable to save the cache in storage: "
+				  "err %d\n", err);
+			return err;
+		}
+	} else {
+		u32 next_blk;
+
+		err = ssdfs_set_range_in_storage(blk_bmap, range, blk_state);
+		if (unlikely(err)) {
+			SSDFS_ERR("unable to set (start %u, len %u) state %#x "
+				  "in storage: err %d\n",
+				  range->start, range->len, blk_state, err);
+			return err;
+		}
+
+		next_blk = range->start + range->len;
+		if (next_blk == blk_bmap->items_count)
+			next_blk--;
+
+		err = ssdfs_cache_block_state(blk_bmap, next_blk, blk_state);
+		if (unlikely(err)) {
+			SSDFS_ERR("unable to cache block %u state: err %d\n",
+				  next_blk, err);
+			return err;
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	ssdfs_debug_block_bitmap(blk_bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_test_block() - check state of block
+ * @blk_bmap: pointer on block bitmap
+ * @blk: segment's block
+ * @blk_state: checked state
+ *
+ * This function checks that requested @blk has @blk_state.
+ *
+ * RETURN:
+ * [true]  - requested @blk has @blk_state
+ * [false] - requested @blk hasn't @blk_state or it took place
+ *           some failure during checking.
+ */
+bool ssdfs_block_bmap_test_block(struct ssdfs_block_bmap *blk_bmap,
+				 u32 blk, int blk_state)
+{
+	u32 found;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	if (blk_state >= SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return false;
+	}
+
+	if (blk >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid block %u\n", blk);
+		return false;
+	}
+
+	BUG_ON(!mutex_is_locked(&blk_bmap->lock));
+
+	SSDFS_DBG("blk_bmap %p, block %u, state %#x\n",
+		  blk_bmap, blk, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	BUG_ON(!is_block_bmap_initialized(blk_bmap));
+
+	err = ssdfs_block_bmap_find_block(blk_bmap, blk, blk + 1, blk_state,
+					  &found);
+	if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to find block %u, state %#x, err %d\n",
+			  blk, blk_state, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	return (found != blk) ? false : true;
+}
+
+/*
+ * ssdfs_block_bmap_test_range() - check state of blocks' range
+ * @blk_bmap: pointer on block bitmap
+ * @range: segment's blocks' range
+ * @blk_state: checked state
+ *
+ * This function checks that all blocks in requested @range have
+ * @blk_state.
+ *
+ * RETURN:
+ * [true]  - all blocks in requested @range have @blk_state
+ * [false] - requested @range contains blocks with various states or
+ *           it took place some failure during checking.
+ */
+bool ssdfs_block_bmap_test_range(struct ssdfs_block_bmap *blk_bmap,
+				 struct ssdfs_block_bmap_range *range,
+				 int blk_state)
+{
+	struct ssdfs_block_bmap_range found;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !range);
+
+	if (blk_state >= SSDFS_BLK_STATE_MAX) {
+		SSDFS_ERR("invalid block state %#x\n", blk_state);
+		return false;
+	}
+
+	if (range_corrupted(blk_bmap, range)) {
+		SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n",
+			  range->start, range->len,
+			  blk_bmap->items_count);
+		return false;
+	}
+
+	BUG_ON(!mutex_is_locked(&blk_bmap->lock));
+
+	SSDFS_DBG("blk_bmap %p, range (start %u, len %u), state %#x\n",
+		  blk_bmap, range->start, range->len, blk_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	BUG_ON(!is_block_bmap_initialized(blk_bmap));
+
+	err = ssdfs_block_bmap_find_range(blk_bmap, range->start, range->len,
+					  range->start + range->len,
+					  blk_state, &found);
+	if (unlikely(err)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to find range: err %d\n", err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	if (compare_block_bmap_ranges(&found, range) == 0)
+		return true;
+
+	return false;
+}
+
+/*
+ * ssdfs_get_block_state() - detect state of block
+ * @blk_bmap: pointer on block bitmap
+ * @blk: segment's block
+ *
+ * This function retrieve state of @blk from block bitmap.
+ *
+ * RETURN:
+ * [success] - state of block
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENODATA    - requsted @blk hasn't been found.
+ * %-ENOENT     - block bitmap doesn't initialized.
+ */
+int ssdfs_get_block_state(struct ssdfs_block_bmap *blk_bmap, u32 blk)
+{
+	u32 found;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	if (blk >= blk_bmap->items_count) {
+		SSDFS_ERR("invalid block %u\n", blk);
+		return -EINVAL;
+	}
+
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, block %u\n", blk_bmap, blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_block_bmap_initialized(blk_bmap)) {
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+		return -ENOENT;
+	}
+
+	err = ssdfs_block_bmap_find_block(blk_bmap, blk, blk + 1,
+					    SSDFS_BLK_STATE_MAX,
+					    &found);
+	if (err) {
+		SSDFS_ERR("fail to find block %u, err %d\n",
+			  blk, err);
+		return err;
+	}
+
+	if (found != blk) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("found (%u) != blk (%u)\n", found, blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	}
+
+	return ssdfs_get_block_state_from_cache(blk_bmap, blk);
+}
+
+/*
+ * ssdfs_get_range_state() - detect state of blocks' range
+ * @blk_bmap: pointer on block bitmap
+ * @range: pointer on blocks' range
+ *
+ * This function retrieve state of @range from block bitmap.
+ *
+ * RETURN:
+ * [success] - state of blocks' range
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-EOPNOTSUPP - requsted @range contains various state of blocks.
+ * %-ENOENT     - block bitmap doesn't initialized.
+ */
+int ssdfs_get_range_state(struct ssdfs_block_bmap *blk_bmap,
+			  struct ssdfs_block_bmap_range *range)
+{
+	struct ssdfs_block_bmap_range found;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !range);
+
+	if (range_corrupted(blk_bmap, range)) {
+		SSDFS_ERR("invalid range: start %u, len %u; items count %zu\n",
+			  range->start, range->len,
+			  blk_bmap->items_count);
+		return -EINVAL;
+	}
+
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, range (start %u, len %u)\n",
+		  blk_bmap, range->start, range->len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_block_bmap_initialized(blk_bmap)) {
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+		return -ENOENT;
+	}
+
+	err = ssdfs_block_bmap_find_range(blk_bmap, range->start, range->len,
+					  range->start + range->len,
+					  SSDFS_BLK_STATE_MAX, &found);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to find range: err %d\n", err);
+		return err;
+	}
+
+	if (compare_block_bmap_ranges(&found, range) != 0) {
+		SSDFS_ERR("range contains various state of blocks\n");
+		return -EOPNOTSUPP;
+	}
+
+	err = ssdfs_cache_block_state(blk_bmap, range->start,
+					SSDFS_BLK_STATE_MAX);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to cache block %u: err %d\n",
+			  range->start, err);
+		return err;
+	}
+
+	return ssdfs_get_block_state_from_cache(blk_bmap, range->start);
+}
+
+/*
+ * ssdfs_block_bmap_reserve_metadata_pages() - reserve metadata pages
+ * @blk_bmap: pointer on block bitmap
+ * @count: count of reserved metadata pages
+ *
+ * This function tries to reserve @count of metadata pages in
+ * block bitmap's space.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENOENT     - block bitmap doesn't initialized.
+ */
+int ssdfs_block_bmap_reserve_metadata_pages(struct ssdfs_block_bmap *blk_bmap,
+					    u32 count)
+{
+	u32 reserved_items;
+	u32 calculated_items;
+	int free_pages = 0;
+	int used_pages = 0;
+	int invalid_pages = 0;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, count %u\n",
+		  blk_bmap, count);
+	SSDFS_DBG("items_count %zu, used_blks %u, "
+		  "metadata_items %u\n",
+		  blk_bmap->items_count,
+		  blk_bmap->used_blks,
+		  blk_bmap->metadata_items);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_block_bmap_initialized(blk_bmap)) {
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+		return -ENOENT;
+	}
+
+	err = ssdfs_block_bmap_get_free_pages(blk_bmap);
+	if (unlikely(err < 0)) {
+		SSDFS_ERR("fail to get free pages: err %d\n", err);
+		return err;
+	} else {
+		free_pages = err;
+		err = 0;
+	}
+
+	if (free_pages < count) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to reserve metadata pages: "
+			  "free_pages %d, count %u\n",
+			  free_pages, count);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENOSPC;
+	}
+
+	err = ssdfs_block_bmap_get_used_pages(blk_bmap);
+	if (unlikely(err < 0)) {
+		SSDFS_ERR("fail to get used pages: err %d\n", err);
+		return err;
+	} else {
+		used_pages = err;
+		err = 0;
+	}
+
+	err  = ssdfs_block_bmap_get_invalid_pages(blk_bmap);
+	if (unlikely(err < 0)) {
+		SSDFS_ERR("fail to get invalid pages: err %d\n", err);
+		return err;
+	} else {
+		invalid_pages = err;
+		err = 0;
+	}
+
+	reserved_items = blk_bmap->metadata_items + count;
+	calculated_items = used_pages + invalid_pages + reserved_items;
+	if (calculated_items > blk_bmap->items_count) {
+		SSDFS_ERR("fail to reserve metadata pages: "
+			  "used_pages %d, invalid_pages %d, "
+			  "metadata_items %u, "
+			  "count %u, items_count %zu\n",
+			  used_pages, invalid_pages,
+			  blk_bmap->metadata_items,
+			  count, blk_bmap->items_count);
+		return -EINVAL;
+	}
+
+	blk_bmap->metadata_items += count;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(blk_bmap->metadata_items == 0);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_free_metadata_pages() - free metadata pages
+ * @blk_bmap: pointer on block bitmap
+ * @count: count of metadata pages for freeing
+ *
+ * This function tries to free @count of metadata pages in
+ * block bitmap's space.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENOENT     - block bitmap doesn't initialized.
+ * %-ERANGE     - internal error.
+ */
+int ssdfs_block_bmap_free_metadata_pages(struct ssdfs_block_bmap *blk_bmap,
+					 u32 count)
+{
+	u32 metadata_items;
+	u32 freed_items;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, count %u\n",
+		  blk_bmap, count);
+	SSDFS_DBG("items_count %zu, used_blks %u, "
+		  "metadata_items %u\n",
+		  blk_bmap->items_count,
+		  blk_bmap->used_blks,
+		  blk_bmap->metadata_items);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_block_bmap_initialized(blk_bmap)) {
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+		return -ENOENT;
+	}
+
+	metadata_items = blk_bmap->metadata_items;
+	freed_items = count;
+
+	if (blk_bmap->metadata_items < count) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("correct value: metadata_items %u < count %u\n",
+			  blk_bmap->metadata_items, count);
+#endif /* CONFIG_SSDFS_DEBUG */
+		freed_items = blk_bmap->metadata_items;
+	}
+
+	blk_bmap->metadata_items -= freed_items;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	if (blk_bmap->metadata_items == 0) {
+		SSDFS_ERR("BEFORE: metadata_items %u, count %u, "
+			  "items_count %zu, used_blks %u, "
+			  "invalid_blks %u\n",
+			  metadata_items, count,
+			  blk_bmap->items_count,
+			  blk_bmap->used_blks,
+			  blk_bmap->invalid_blks);
+		SSDFS_ERR("AFTER: metadata_items %u, freed_items %u\n",
+			  blk_bmap->metadata_items, freed_items);
+	}
+	BUG_ON(blk_bmap->metadata_items == 0);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_get_free_pages() - determine current free pages count
+ * @blk_bmap: pointer on block bitmap
+ *
+ * This function tries to detect current free pages count
+ * in block bitmap.
+ *
+ * RETURN:
+ * [success] - count of free pages.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ERANGE     - invalid internal calculations.
+ * %-ENOENT     - block bitmap doesn't initialized.
+ */
+int ssdfs_block_bmap_get_free_pages(struct ssdfs_block_bmap *blk_bmap)
+{
+	u32 found_blk;
+	u32 used_blks;
+	u32 metadata_items;
+	u32 invalid_blks;
+	int free_blks;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p\n", blk_bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_block_bmap_initialized(blk_bmap)) {
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+		return -ENOENT;
+	}
+
+	if (is_cache_invalid(blk_bmap, SSDFS_BLK_FREE)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("cache for free states is invalid!!!\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_block_bmap_find_block(blk_bmap,
+						  0, blk_bmap->items_count,
+						  SSDFS_BLK_FREE, &found_blk);
+	} else
+		err = ssdfs_define_last_free_page(blk_bmap, &found_blk);
+
+	if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to find last free block: "
+			  "found_blk %u\n",
+			  found_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to find last free block: err %d\n",
+			  err);
+		return err;
+	}
+
+	used_blks = blk_bmap->used_blks;
+	metadata_items = blk_bmap->metadata_items;
+	invalid_blks = blk_bmap->invalid_blks;
+
+	free_blks = blk_bmap->items_count;
+	free_blks -= used_blks + metadata_items + invalid_blks;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("items_count %zu, used_blks %u, "
+		  "invalid_blks %u, "
+		  "metadata_items %u, free_blks %d\n",
+		  blk_bmap->items_count,
+		  used_blks, invalid_blks, metadata_items,
+		  free_blks);
+
+	if (unlikely(found_blk > blk_bmap->items_count)) {
+		SSDFS_ERR("found_blk %u > items_count %zu\n",
+			  found_blk, blk_bmap->items_count);
+		return -ERANGE;
+	}
+
+	WARN_ON(INT_MAX < (blk_bmap->items_count - found_blk));
+
+	if (unlikely((used_blks + metadata_items) > blk_bmap->items_count)) {
+		SSDFS_ERR("used_blks %u, metadata_items %u, "
+			  "items_count %zu\n",
+			  used_blks, metadata_items,
+			  blk_bmap->items_count);
+		return -ERANGE;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (free_blks < 0) {
+		SSDFS_ERR("items_count %zu, used_blks %u, "
+			  "invalid_blks %u, "
+			  "metadata_items %u, free_blks %d\n",
+			  blk_bmap->items_count,
+			  used_blks, invalid_blks, metadata_items,
+			  free_blks);
+		return -ERANGE;
+	}
+
+	return free_blks;
+}
+
+/*
+ * ssdfs_block_bmap_get_used_pages() - determine current used pages count
+ * @blk_bmap: pointer on block bitmap
+ *
+ * This function tries to detect current used pages count
+ * in block bitmap.
+ *
+ * RETURN:
+ * [success] - count of used pages.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ERANGE     - invalid internal calculations.
+ * %-ENOENT     - block bitmap doesn't initialized.
+ */
+int ssdfs_block_bmap_get_used_pages(struct ssdfs_block_bmap *blk_bmap)
+{
+	u32 found_blk;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p\n", blk_bmap);
+	SSDFS_DBG("items_count %zu, used_blks %u, "
+		  "metadata_items %u, invalid_blks %u\n",
+		  blk_bmap->items_count,
+		  blk_bmap->used_blks,
+		  blk_bmap->metadata_items,
+		  blk_bmap->invalid_blks);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_block_bmap_initialized(blk_bmap)) {
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+		return -ENOENT;
+	}
+
+	err = ssdfs_define_last_free_page(blk_bmap, &found_blk);
+	if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to find last free block: "
+			  "found_blk %u\n",
+			  found_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to find last free block: err %d\n",
+			  err);
+		return err;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	if (unlikely(found_blk > blk_bmap->items_count)) {
+		SSDFS_ERR("found_blk %u > items_count %zu\n",
+			  found_blk, blk_bmap->items_count);
+		return -ERANGE;
+	}
+
+	if (unlikely(blk_bmap->used_blks > blk_bmap->items_count)) {
+		SSDFS_ERR("used_blks %u > items_count %zu\n",
+			  blk_bmap->used_blks,
+			  blk_bmap->items_count);
+		return -ERANGE;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return blk_bmap->used_blks;
+}
+
+/*
+ * ssdfs_block_bmap_get_invalid_pages() - determine current invalid pages count
+ * @blk_bmap: pointer on block bitmap
+ *
+ * This function tries to detect current invalid pages count
+ * in block bitmap.
+ *
+ * RETURN:
+ * [success] - count of invalid pages.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENOENT     - block bitmap doesn't initialized.
+ */
+int ssdfs_block_bmap_get_invalid_pages(struct ssdfs_block_bmap *blk_bmap)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p\n", blk_bmap);
+	SSDFS_DBG("items_count %zu, used_blks %u, "
+		  "metadata_items %u, invalid_blks %u\n",
+		  blk_bmap->items_count,
+		  blk_bmap->used_blks,
+		  blk_bmap->metadata_items,
+		  blk_bmap->invalid_blks);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_block_bmap_initialized(blk_bmap)) {
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+		return -ENOENT;
+	}
+
+	return blk_bmap->invalid_blks;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH 12/76] ssdfs: block bitmap modification operations implementation
  2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
                   ` (10 preceding siblings ...)
  2023-02-25  1:08 ` [RFC PATCH 11/76] ssdfs: block bitmap search operations implementation Viacheslav Dubeyko
@ 2023-02-25  1:08 ` Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 13/76] ssdfs: introduce PEB block bitmap Viacheslav Dubeyko
                   ` (64 subsequent siblings)
  76 siblings, 0 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

This patch implements block bitmap's modification operations:
pre_allocate - pre_allocate logical block or range of blocks
allocate - allocate logical block or range of blocks
invalidate - invalidate logical block or range of blocks
collect_garbage - get contigous range of blocks in state
clean - convert the whole block bitmap into clean state

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
CC: Viacheslav Dubeyko <viacheslav.dubeyko@bytedance.com>
CC: Luka Perkov <luka.perkov@sartura.hr>
CC: Bruno Banelli <bruno.banelli@sartura.hr>
---
 fs/ssdfs/block_bitmap.c | 703 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 703 insertions(+)

diff --git a/fs/ssdfs/block_bitmap.c b/fs/ssdfs/block_bitmap.c
index 3e3ddb6ff745..258d3b3856e1 100644
--- a/fs/ssdfs/block_bitmap.c
+++ b/fs/ssdfs/block_bitmap.c
@@ -4608,3 +4608,706 @@ int ssdfs_block_bmap_get_invalid_pages(struct ssdfs_block_bmap *blk_bmap)
 
 	return blk_bmap->invalid_blks;
 }
+
+/*
+ * ssdfs_block_bmap_pre_allocate() - pre-allocate segment's range of blocks
+ * @blk_bmap: pointer on block bitmap
+ * @start: starting block for search
+ * @len: pointer on variable with requested length of range
+ * @range: pointer on blocks' range [in | out]
+ *
+ * This function tries to find contiguous range of free blocks and
+ * to set the found range in pre-allocated state.
+ *
+ * If pointer @len is NULL then it needs:
+ * (1) check that requested range contains free blocks only;
+ * (2) set the requested range of blocks in pre-allocated state.
+ *
+ * Otherwise, if pointer @len != NULL then it needs:
+ * (1) find the range of free blocks of requested length or lesser;
+ * (2) set the found range of blocks in pre-allocated state.
+ *
+ * RETURN:
+ * [success] - @range of pre-allocated blocks.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENOENT     - block bitmap doesn't initialized.
+ * %-ENOSPC     - block bitmap hasn't free blocks.
+ */
+int ssdfs_block_bmap_pre_allocate(struct ssdfs_block_bmap *blk_bmap,
+				  u32 start, u32 *len,
+				  struct ssdfs_block_bmap_range *range)
+{
+	int free_pages;
+	u32 used_blks = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !range);
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("start %u, len %p\n",
+		  start, len);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (len) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("blk_bmap %p, start %u, len %u\n",
+			  blk_bmap, start, *len);
+#endif /* CONFIG_SSDFS_DEBUG */
+	} else {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("blk_bmap %p, range (start %u, len %u)\n",
+			  blk_bmap, range->start, range->len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (range_corrupted(blk_bmap, range)) {
+			SSDFS_ERR("invalid range: start %u, len %u; "
+				  "items count %zu\n",
+				  range->start, range->len,
+				  blk_bmap->items_count);
+			return -EINVAL;
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("items_count %zu, used_blks %u, "
+		  "metadata_items %u, invalid_blks %u\n",
+		  blk_bmap->items_count,
+		  blk_bmap->used_blks,
+		  blk_bmap->metadata_items,
+		  blk_bmap->invalid_blks);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_block_bmap_initialized(blk_bmap)) {
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+		return -ENOENT;
+	}
+
+	err = ssdfs_block_bmap_get_free_pages(blk_bmap);
+	if (unlikely(err < 0)) {
+		SSDFS_ERR("fail to get free pages: err %d\n", err);
+		return err;
+	} else {
+		free_pages = err;
+		err = 0;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("free_pages %d\n", free_pages);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (len) {
+		u32 max_blk = blk_bmap->items_count - blk_bmap->metadata_items;
+		u32 start_blk = 0;
+
+		if (free_pages < *len) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to pre_allocate: "
+				  "free_pages %d, count %u\n",
+				  free_pages, *len);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return -ENOSPC;
+		}
+
+		if (!is_cache_invalid(blk_bmap, SSDFS_BLK_FREE)) {
+			err = ssdfs_define_last_free_page(blk_bmap, &start_blk);
+			if (err) {
+				SSDFS_ERR("fail to define start block: "
+					  "err %d\n",
+					  err);
+				return err;
+			}
+		}
+
+		start_blk = max_t(u32, start_blk, start);
+
+		err = ssdfs_block_bmap_find_range(blk_bmap, start_blk, *len,
+						  max_blk,
+						  SSDFS_BLK_FREE, range);
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find free blocks: "
+				  "start_blk %u, max_blk %u, len %u\n",
+				  start_blk, max_blk, *len);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return -ENOSPC;
+		} else if (err) {
+			SSDFS_ERR("fail to find free blocks: err %d\n", err);
+			return err;
+		}
+	} else {
+		if (free_pages < range->len) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to pre_allocate: "
+				  "free_pages %d, count %u\n",
+				  free_pages, range->len);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return -ENOSPC;
+		}
+
+		if (!is_range_free(blk_bmap, range)) {
+			SSDFS_ERR("range (start %u, len %u) is not free\n",
+				  range->start, range->len);
+			return -EINVAL;
+		}
+	}
+
+	used_blks = (u32)blk_bmap->used_blks + range->len;
+
+	if (used_blks > blk_bmap->items_count) {
+		SSDFS_ERR("invalid used blocks count: "
+			  "used_blks %u, items_count %zu\n",
+			  used_blks,
+			  blk_bmap->items_count);
+		return -ERANGE;
+	}
+
+	err = ssdfs_block_bmap_set_range(blk_bmap, range,
+					 SSDFS_BLK_PRE_ALLOCATED);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to set range (start %u, len %u): err %d\n",
+			  range->start, range->len, err);
+		return err;
+	}
+
+	blk_bmap->used_blks += range->len;
+
+	set_block_bmap_dirty(blk_bmap);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("range (start %u, len %u) has been pre-allocated\n",
+		  range->start, range->len);
+#else
+	SSDFS_DBG("range (start %u, len %u) has been pre-allocated\n",
+		  range->start, range->len);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_allocate() - allocate segment's range of blocks
+ * @blk_bmap: pointer on block bitmap
+ * @start: starting block for search
+ * @len: pointer on variable with requested length of range
+ * @range: pointer on blocks' range [in | out]
+ *
+ * This function tries to find contiguous range of free
+ * (or pre-allocated) blocks and to set the found range in
+ * valid state.
+ *
+ * If pointer @len is NULL then it needs:
+ * (1) check that requested range contains free or pre-allocated blocks;
+ * (2) set the requested range of blocks in valid state.
+ *
+ * Otherwise, if pointer @len != NULL then it needs:
+ * (1) find the range of free blocks of requested length or lesser;
+ * (2) set the found range of blocks in valid state.
+ *
+ * RETURN:
+ * [success] - @range of valid blocks.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENOENT     - block bitmap doesn't initialized.
+ * %-ENOSPC     - block bitmap hasn't free blocks.
+ */
+int ssdfs_block_bmap_allocate(struct ssdfs_block_bmap *blk_bmap,
+				u32 start, u32 *len,
+				struct ssdfs_block_bmap_range *range)
+{
+	int state = SSDFS_BLK_FREE;
+	int free_pages;
+	u32 used_blks = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !range);
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("start %u, len %p\n",
+		  start, len);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (len) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("blk_bmap %p, start %u, len %u\n",
+			  blk_bmap, start, *len);
+#endif /* CONFIG_SSDFS_DEBUG */
+	} else {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("blk_bmap %p, range (start %u, len %u)\n",
+			  blk_bmap, range->start, range->len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (range_corrupted(blk_bmap, range)) {
+			SSDFS_ERR("invalid range: start %u, len %u; "
+				  "items count %zu\n",
+				  range->start, range->len,
+				  blk_bmap->items_count);
+			return -EINVAL;
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("items_count %zu, used_blks %u, "
+		  "metadata_items %u\n",
+		  blk_bmap->items_count,
+		  blk_bmap->used_blks,
+		  blk_bmap->metadata_items);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_block_bmap_initialized(blk_bmap)) {
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+		return -ENOENT;
+	}
+
+	err = ssdfs_block_bmap_get_free_pages(blk_bmap);
+	if (unlikely(err < 0)) {
+		SSDFS_ERR("fail to get free pages: err %d\n", err);
+		return err;
+	} else {
+		free_pages = err;
+		err = 0;
+	}
+
+	if (len) {
+		u32 max_blk = blk_bmap->items_count - blk_bmap->metadata_items;
+		u32 start_blk = 0;
+
+		if (free_pages < *len) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to allocate: "
+				  "free_pages %d, count %u\n",
+				  free_pages, *len);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return -ENOSPC;
+		}
+
+		if (!is_cache_invalid(blk_bmap, SSDFS_BLK_FREE)) {
+			err = ssdfs_define_last_free_page(blk_bmap, &start_blk);
+			if (err) {
+				SSDFS_ERR("fail to define start block: "
+					  "err %d\n",
+					  err);
+				return err;
+			}
+		}
+
+		start_blk = max_t(u32, start_blk, start);
+
+		err = ssdfs_block_bmap_find_range(blk_bmap, start_blk, *len,
+						  max_blk, SSDFS_BLK_FREE,
+						  range);
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find free blocks: "
+				  "start_blk %u, max_blk %u, len %u\n",
+				  start_blk, max_blk, *len);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return -ENOSPC;
+		} else if (err) {
+			SSDFS_ERR("fail to find free blocks: err %d\n", err);
+			return err;
+		}
+	} else {
+		state = ssdfs_get_range_state(blk_bmap, range);
+
+		if (state < 0) {
+			SSDFS_ERR("fail to get range "
+				  "(start %u, len %u) state: err %d\n",
+				  range->start, range->len, state);
+			return state;
+		}
+
+		if (state != SSDFS_BLK_FREE &&
+		    state != SSDFS_BLK_PRE_ALLOCATED) {
+			SSDFS_ERR("range (start %u, len %u), state %#x, "
+				  "can't be allocated\n",
+				  range->start, range->len, state);
+			return -EINVAL;
+		}
+	}
+
+	err = ssdfs_block_bmap_set_range(blk_bmap, range,
+					 SSDFS_BLK_VALID);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to set range (start %u, len %u): "
+			  "err %d\n",
+			  range->start, range->len, err);
+		return err;
+	}
+
+	if (state == SSDFS_BLK_FREE) {
+		used_blks = (u32)blk_bmap->used_blks + range->len;
+
+		if (used_blks > blk_bmap->items_count) {
+			SSDFS_ERR("invalid used blocks count: "
+				  "used_blks %u, items_count %zu\n",
+				  used_blks,
+				  blk_bmap->items_count);
+			return -ERANGE;
+		}
+
+		blk_bmap->used_blks += range->len;
+	}
+
+	set_block_bmap_dirty(blk_bmap);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("range (start %u, len %u) has been allocated\n",
+		  range->start, range->len);
+#else
+	SSDFS_DBG("range (start %u, len %u) has been allocated\n",
+		  range->start, range->len);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_invalidate() - invalidate segment's range of blocks
+ * @blk_bmap: pointer on block bitmap
+ * @len: pointer on variable with requested length of range
+ * @range: pointer on blocks' range [in | out]
+ *
+ * This function tries to set the requested range of blocks in
+ * invalid state. At first, it checks that requested range contains
+ * valid blocks only. And, then, it sets the requested range of blocks
+ * in invalid state.
+ *
+ * RETURN:
+ * [success] - @range of invalid blocks.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENOENT     - block bitmap doesn't initialized.
+ */
+int ssdfs_block_bmap_invalidate(struct ssdfs_block_bmap *blk_bmap,
+				struct ssdfs_block_bmap_range *range)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !range);
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, range (start %u, len %u)\n",
+		  blk_bmap, range->start, range->len);
+	SSDFS_DBG("items_count %zu, used_blks %u, "
+		  "metadata_items %u, invalid_blks %u\n",
+		  blk_bmap->items_count,
+		  blk_bmap->used_blks,
+		  blk_bmap->metadata_items,
+		  blk_bmap->invalid_blks);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("range (start %u, len %u)\n",
+		  range->start, range->len);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (!is_block_bmap_initialized(blk_bmap)) {
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+		return -ENOENT;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	ssdfs_debug_block_bitmap(blk_bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (range_corrupted(blk_bmap, range)) {
+		SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n",
+			  range->start, range->len, blk_bmap->items_count);
+		return -EINVAL;
+	}
+
+	if (!is_range_valid(blk_bmap, range) &&
+	    !is_range_pre_allocated(blk_bmap, range)) {
+		SSDFS_ERR("range (start %u, len %u) contains not valid blocks\n",
+			  range->start, range->len);
+		return -EINVAL;
+	}
+
+	err = ssdfs_block_bmap_set_range(blk_bmap, range,
+					 SSDFS_BLK_INVALID);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to set range (start %u, len %u): err %d\n",
+			  range->start, range->len, err);
+		return err;
+	}
+
+	blk_bmap->invalid_blks += range->len;
+
+	if (range->len > blk_bmap->used_blks) {
+		SSDFS_ERR("invalid range len: "
+			  "range_len %u, used_blks %u, items_count %zu\n",
+			  range->len,
+			  blk_bmap->used_blks,
+			  blk_bmap->items_count);
+		return -ERANGE;
+	} else
+		blk_bmap->used_blks -= range->len;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("items_count %zu, used_blks %u, "
+		  "metadata_items %u, invalid_blks %u\n",
+		  blk_bmap->items_count,
+		  blk_bmap->used_blks,
+		  blk_bmap->metadata_items,
+		  blk_bmap->invalid_blks);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	set_block_bmap_dirty(blk_bmap);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	ssdfs_debug_block_bitmap(blk_bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("range (start %u, len %u) has been invalidated\n",
+		  range->start, range->len);
+#else
+	SSDFS_DBG("range (start %u, len %u) has been invalidated\n",
+		  range->start, range->len);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_collect_garbage() - find range of valid blocks for GC
+ * @blk_bmap: pointer on block bitmap
+ * @start: starting position for search
+ * @max_len: maximum requested length of valid blocks' range
+ * @blk_state: requested block state (pre-allocated or valid)
+ * @range: pointer on blocks' range [out]
+ *
+ * This function tries to find range of valid blocks for GC.
+ * The length of requested range is limited by @max_len.
+ *
+ * RETURN:
+ * [success] - @range of invalid blocks.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENOENT     - block bitmap doesn't initialized.
+ * %-ENODATA    - requested range hasn't valid blocks.
+ */
+int ssdfs_block_bmap_collect_garbage(struct ssdfs_block_bmap *blk_bmap,
+				     u32 start, u32 max_len,
+				     int blk_state,
+				     struct ssdfs_block_bmap_range *range)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap || !range);
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p, start %u, max_len %u\n",
+		  blk_bmap, start, max_len);
+	SSDFS_DBG("items_count %zu, used_blks %u, "
+		  "metadata_items %u, invalid_blks %u\n",
+		  blk_bmap->items_count,
+		  blk_bmap->used_blks,
+		  blk_bmap->metadata_items,
+		  blk_bmap->invalid_blks);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("start %u, max_len %u, blk_state %#x\n",
+		  start, max_len, blk_state);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (!is_block_bmap_initialized(blk_bmap)) {
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+		return -ENOENT;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	ssdfs_debug_block_bitmap(blk_bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (blk_state) {
+	case SSDFS_BLK_PRE_ALLOCATED:
+	case SSDFS_BLK_VALID:
+		/* valid block state */
+		break;
+
+	default:
+		SSDFS_ERR("invalid block state: %#x\n",
+			  blk_state);
+		return -EINVAL;
+	};
+
+	err = ssdfs_block_bmap_find_range(blk_bmap, start, max_len, max_len,
+					  blk_state, range);
+	if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("range (start %u, len %u) hasn't valid blocks\n",
+			  start, max_len);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return err;
+	} else if (err) {
+		SSDFS_ERR("fail to find valid blocks: err %d\n", err);
+		return err;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("range (start %u, len %u) has been collected as garbage\n",
+		  range->start, range->len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("range (start %u, len %u) has been collected as garbage\n",
+		  range->start, range->len);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	return 0;
+}
+
+/*
+ * ssdfs_block_bmap_clean() - set all blocks as free/clean
+ * @blk_bmap: pointer on block bitmap
+ *
+ * This function tries to clean the whole bitmap.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ENOENT     - block bitmap doesn't initialized.
+ */
+int ssdfs_block_bmap_clean(struct ssdfs_block_bmap *blk_bmap)
+{
+	struct ssdfs_block_bmap_range range;
+	int max_capacity = SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!blk_bmap);
+	if (!mutex_is_locked(&blk_bmap->lock)) {
+		SSDFS_WARN("block bitmap mutex should be locked\n");
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("blk_bmap %p\n", blk_bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_block_bmap_initialized(blk_bmap)) {
+		SSDFS_WARN("block bitmap hasn't been initialized\n");
+		return -ENOENT;
+	}
+
+	blk_bmap->metadata_items = 0;
+	blk_bmap->used_blks = 0;
+	blk_bmap->invalid_blks = 0;
+
+	for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) {
+		blk_bmap->last_search[i].page_index = max_capacity;
+		blk_bmap->last_search[i].offset = U16_MAX;
+		blk_bmap->last_search[i].cache = 0;
+	}
+
+	range.start = 0;
+	range.len = blk_bmap->items_count;
+
+	err = ssdfs_set_range_in_storage(blk_bmap, &range, SSDFS_BLK_FREE);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to clean block bmap: "
+			  "range (start %u, len %u), "
+			  "err %d\n",
+			  range.start, range.len, err);
+		return err;
+	}
+
+	err = ssdfs_cache_block_state(blk_bmap, 0, SSDFS_BLK_FREE);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to cache last free page: err %d\n",
+			  err);
+		return err;
+	}
+
+	return 0;
+}
+
+#ifdef CONFIG_SSDFS_DEBUG
+static
+void ssdfs_debug_block_bitmap(struct ssdfs_block_bmap *bmap)
+{
+	struct ssdfs_page_vector *array;
+	struct page *page;
+	void *kaddr;
+	int i;
+
+	BUG_ON(!bmap);
+
+	SSDFS_DBG("BLOCK BITMAP: bytes_count %zu, items_count %zu, "
+		  "metadata_items %u, used_blks %u, invalid_blks %u, "
+		  "flags %#x\n",
+		  bmap->bytes_count,
+		  bmap->items_count,
+		  bmap->metadata_items,
+		  bmap->used_blks,
+		  bmap->invalid_blks,
+		  atomic_read(&bmap->flags));
+
+	SSDFS_DBG("LAST SEARCH:\n");
+	for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) {
+		SSDFS_DBG("TYPE %d: page_index %d, offset %u, cache %lx\n",
+			  i,
+			  bmap->last_search[i].page_index,
+			  bmap->last_search[i].offset,
+			  bmap->last_search[i].cache);
+	}
+
+	switch (bmap->storage.state) {
+	case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC:
+		array = &bmap->storage.array;
+
+		for (i = 0; i < ssdfs_page_vector_count(array); i++) {
+			page = array->pages[i];
+
+			if (!page) {
+				SSDFS_WARN("page %d is NULL\n", i);
+				continue;
+			}
+
+			kaddr = kmap_local_page(page);
+			SSDFS_DBG("BMAP CONTENT: page %d\n", i);
+			print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+					     kaddr, PAGE_SIZE);
+			kunmap_local(kaddr);
+		}
+		break;
+
+	case SSDFS_BLOCK_BMAP_STORAGE_BUFFER:
+		SSDFS_DBG("BMAP CONTENT:\n");
+		print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+				     bmap->storage.buf,
+				     bmap->bytes_count);
+		break;
+	}
+}
+#endif /* CONFIG_SSDFS_DEBUG */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC PATCH 13/76] ssdfs: introduce PEB block bitmap
  2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
                   ` (11 preceding siblings ...)
  2023-02-25  1:08 ` [RFC PATCH 12/76] ssdfs: block bitmap modification " Viacheslav Dubeyko
@ 2023-02-25  1:08 ` Viacheslav Dubeyko
  2023-02-25  1:08 ` [RFC PATCH 14/76] ssdfs: PEB block bitmap modification operations Viacheslav Dubeyko
                   ` (63 subsequent siblings)
  76 siblings, 0 replies; 82+ messages in thread
From: Viacheslav Dubeyko @ 2023-02-25  1:08 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: viacheslav.dubeyko, luka.perkov, bruno.banelli, Viacheslav Dubeyko

SSDFS implements a migration scheme. Migration scheme is
a fundamental technique of GC overhead management. The key
responsibility of the migration scheme is to guarantee
the presence of data in the same segment for any update
operations. Generally speaking, the migration scheme’s model
is implemented on the basis of association an exhausted
"Physical" Erase Block (PEB) with a clean one. The goal such
association of two PEBs is to implement the gradual migration
of data by means of the update operations in the initial
(exhausted) PEB. As a result, the old, exhausted PEB becomes
invalidated after complete data migration and it will be
possible to apply the erase operation to convert it in the
clean state. Moreover, the destination PEB in the association
changes the initial PEB for some index in the segment and, finally,
it becomes the only PEB for this position. Namely such technique
implements the concept of logical extent with the goal to decrease
the write amplification issue and to manage the GC overhead.
Because the logical extent concept excludes the necessity
to update metadata is tracking the position of user data on
the file system’s volume. Generally speaking, the migration scheme
is capable to decrease the GC activity significantly by means of
excluding the necessity to update metadata and by means of
self-migration of data between of PEBs is triggered by regular
update operations.

To implement the migration scheme concept, SSDFS introduces
PEB container that includes source and destination erase blocks.
As a result, PEB block bitmap object represents the same aggregation
for source PEB's block bitmap and destination PEB's block bitmap.
PEB block bitmap implements API:
(1) create - create PEB block bitmap
(2) destroy - destroy PEB block bitmap
(3) init - initialize PEB block bitmap by metadata from a log
(4) get_free_pages - get free pages in aggregation of block bitmaps
(5) get_used_pages - get used pages in aggregation of block bitmaps
(6) get_invalid_pages - get invalid pages in aggregation of block bitmaps
(7) pre_allocate - pre_allocate page/range in aggregation of block bitmaps
(8) allocate - allocate page/range in aggregation of block bitmaps
(9) invalidate - invalidate page/range in aggregation of block bitmaps
(10) update_range - change the state of range in aggregation of block bitmaps
(11) collect_garbage - find contiguous range for requested state
(12) start_migration - prepare PEB's environment for migration
(13) migrate - move range from source block bitmap into destination one
(14) finish_migration - clean source block bitmap and swap block bitmaps

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
CC: Viacheslav Dubeyko <viacheslav.dubeyko@bytedance.com>
CC: Luka Perkov <luka.perkov@sartura.hr>
CC: Bruno Banelli <bruno.banelli@sartura.hr>
---
 fs/ssdfs/peb_block_bitmap.c | 1540 +++++++++++++++++++++++++++++++++++
 fs/ssdfs/peb_block_bitmap.h |  165 ++++
 2 files changed, 1705 insertions(+)
 create mode 100644 fs/ssdfs/peb_block_bitmap.c
 create mode 100644 fs/ssdfs/peb_block_bitmap.h

diff --git a/fs/ssdfs/peb_block_bitmap.c b/fs/ssdfs/peb_block_bitmap.c
new file mode 100644
index 000000000000..0011ed7dc306
--- /dev/null
+++ b/fs/ssdfs/peb_block_bitmap.c
@@ -0,0 +1,1540 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/peb_block_bitmap.c - PEB's block bitmap implementation.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+#include "page_vector.h"
+#include "peb_block_bitmap.h"
+#include "segment_block_bitmap.h"
+#include "offset_translation_table.h"
+#include "page_array.h"
+#include "peb_container.h"
+#include "segment_bitmap.h"
+#include "segment.h"
+
+#define SSDFS_PEB_BLK_BMAP_STATE_FNS(value, name)			\
+static inline								\
+bool is_peb_block_bmap_##name(struct ssdfs_peb_blk_bmap *bmap)		\
+{									\
+	return atomic_read(&bmap->state) == SSDFS_PEB_BLK_BMAP_##value;	\
+}									\
+static inline								\
+void set_peb_block_bmap_##name(struct ssdfs_peb_blk_bmap *bmap)		\
+{									\
+	atomic_set(&bmap->state, SSDFS_PEB_BLK_BMAP_##value);		\
+}									\
+
+/*
+ * is_peb_block_bmap_created()
+ * set_peb_block_bmap_created()
+ */
+SSDFS_PEB_BLK_BMAP_STATE_FNS(CREATED, created)
+
+/*
+ * is_peb_block_bmap_initialized()
+ * set_peb_block_bmap_initialized()
+ */
+SSDFS_PEB_BLK_BMAP_STATE_FNS(INITIALIZED, initialized)
+
+bool ssdfs_peb_blk_bmap_initialized(struct ssdfs_peb_blk_bmap *ptr)
+{
+	return is_peb_block_bmap_initialized(ptr);
+}
+
+/*
+ * ssdfs_peb_blk_bmap_create() - construct PEB's block bitmap
+ * @parent: parent segment's block bitmap
+ * @peb_index: PEB's index in segment's array
+ * @items_count: count of described items
+ * @flag: define necessity to allocate memory
+ * @init_flag: definition of block bitmap's creation state
+ * @init_state: block state is used during initialization
+ *
+ * This function tries to create the source and destination block
+ * bitmap objects.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ERANGE     - internal error.
+ */
+int ssdfs_peb_blk_bmap_create(struct ssdfs_segment_blk_bmap *parent,
+			      u16 peb_index, u32 items_count,
+			      int init_flag, int init_state)
+{
+	struct ssdfs_fs_info *fsi;
+	struct ssdfs_segment_info *si;
+	struct ssdfs_peb_blk_bmap *bmap;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!parent || !parent->peb);
+	BUG_ON(peb_index >= parent->pebs_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("parent %p, peb_index %u, "
+		  "items_count %u, init_flag %#x, init_state %#x\n",
+		  parent, peb_index,
+		  items_count, init_flag, init_state);
+#else
+	SSDFS_DBG("parent %p, peb_index %u, "
+		  "items_count %u, init_flag %#x, init_state %#x\n",
+		  parent, peb_index,
+		  items_count, init_flag, init_state);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	fsi = parent->parent_si->fsi;
+	si = parent->parent_si;
+	bmap = &parent->peb[peb_index];
+	atomic_set(&bmap->state, SSDFS_PEB_BLK_BMAP_STATE_UNKNOWN);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_id %llu, peb_index %u\n",
+		  si->seg_id, bmap->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (items_count > parent->pages_per_peb) {
+		SSDFS_ERR("items_count %u > pages_per_peb %u\n",
+			  items_count, parent->pages_per_peb);
+		return -ERANGE;
+	}
+
+	bmap->parent = parent;
+	bmap->peb_index = peb_index;
+	bmap->pages_per_peb = parent->pages_per_peb;
+
+	init_rwsem(&bmap->modification_lock);
+	atomic_set(&bmap->peb_valid_blks, 0);
+	atomic_set(&bmap->peb_invalid_blks, 0);
+	atomic_set(&bmap->peb_free_blks, 0);
+
+	atomic_set(&bmap->buffers_state, SSDFS_PEB_BMAP_BUFFERS_EMPTY);
+	init_rwsem(&bmap->lock);
+	bmap->init_cno = U64_MAX;
+
+	err = ssdfs_block_bmap_create(fsi,
+				      &bmap->buffer[SSDFS_PEB_BLK_BMAP1],
+				      items_count, init_flag, init_state);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to create source block bitmap: "
+			  "peb_index %u, items_count %u, "
+			  "init_flag %#x, init_state %#x\n",
+			  peb_index, items_count,
+			  init_flag, init_state);
+		goto fail_create_peb_bmap;
+	}
+
+	err = ssdfs_block_bmap_create(fsi,
+				      &bmap->buffer[SSDFS_PEB_BLK_BMAP2],
+				      items_count,
+				      SSDFS_BLK_BMAP_CREATE,
+				      SSDFS_BLK_FREE);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to create destination block bitmap: "
+			  "peb_index %u, items_count %u\n",
+			  peb_index, items_count);
+		goto fail_create_peb_bmap;
+	}
+
+	if (init_flag == SSDFS_BLK_BMAP_CREATE) {
+		atomic_set(&bmap->peb_free_blks, items_count);
+		atomic_add(items_count, &parent->seg_free_blks);
+	}
+
+	bmap->src = &bmap->buffer[SSDFS_PEB_BLK_BMAP1];
+	bmap->dst = NULL;
+
+	init_completion(&bmap->init_end);
+
+	atomic_set(&bmap->buffers_state, SSDFS_PEB_BMAP1_SRC);
+	atomic_set(&bmap->state, SSDFS_PEB_BLK_BMAP_CREATED);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("finished\n");
+#else
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	return 0;
+
+fail_create_peb_bmap:
+	ssdfs_peb_blk_bmap_destroy(bmap);
+	return err;
+}
+
+/*
+ * ssdfs_peb_blk_bmap_destroy() - destroy PEB's block bitmap
+ * @ptr: PEB's block bitmap object
+ *
+ * This function tries to destroy PEB's block bitmap object.
+ */
+void ssdfs_peb_blk_bmap_destroy(struct ssdfs_peb_blk_bmap *ptr)
+{
+	if (!ptr)
+		return;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(rwsem_is_locked(&ptr->lock));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("ptr %p, peb_index %u, "
+		  "state %#x, valid_logical_blks %d, "
+		  "invalid_logical_blks %d, "
+		  "free_logical_blks %d\n",
+		  ptr, ptr->peb_index,
+		  atomic_read(&ptr->state),
+		  atomic_read(&ptr->peb_valid_blks),
+		  atomic_read(&ptr->peb_invalid_blks),
+		  atomic_read(&ptr->peb_free_blks));
+#else
+	SSDFS_DBG("ptr %p, peb_index %u, "
+		  "state %#x, valid_logical_blks %d, "
+		  "invalid_logical_blks %d, "
+		  "free_logical_blks %d\n",
+		  ptr, ptr->peb_index,
+		  atomic_read(&ptr->state),
+		  atomic_read(&ptr->peb_valid_blks),
+		  atomic_read(&ptr->peb_invalid_blks),
+		  atomic_read(&ptr->peb_free_blks));
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (!is_peb_block_bmap_initialized(ptr))
+		SSDFS_WARN("PEB's block bitmap hasn't been initialized\n");
+
+	atomic_set(&ptr->peb_valid_blks, 0);
+	atomic_set(&ptr->peb_invalid_blks, 0);
+	atomic_set(&ptr->peb_free_blks, 0);
+
+	ptr->src = NULL;
+	ptr->dst = NULL;
+	atomic_set(&ptr->buffers_state, SSDFS_PEB_BMAP_BUFFERS_EMPTY);
+
+	ssdfs_block_bmap_destroy(&ptr->buffer[SSDFS_PEB_BLK_BMAP1]);
+	ssdfs_block_bmap_destroy(&ptr->buffer[SSDFS_PEB_BLK_BMAP2]);
+
+	atomic_set(&ptr->state, SSDFS_PEB_BLK_BMAP_STATE_UNKNOWN);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("finished\n");
+#else
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+}
+
+/*
+ * ssdfs_peb_blk_bmap_init() - init PEB's block bitmap
+ * @bmap: pointer on PEB's block bitmap object
+ * @source: pointer on pagevec with bitmap state
+ * @hdr: header of block bitmap fragment
+ * @cno: log's checkpoint
+ *
+ * This function tries to init PEB's block bitmap.
+ *
+ * RETURN:
+ * [success] - count of free pages.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ERANGE     - invalid internal calculations.
+ */
+int ssdfs_peb_blk_bmap_init(struct ssdfs_peb_blk_bmap *bmap,
+			    struct ssdfs_page_vector *source,
+			    struct ssdfs_block_bitmap_fragment *hdr,
+			    u64 cno)
+{
+	struct ssdfs_fs_info *fsi;
+	struct ssdfs_segment_info *si;
+	struct ssdfs_peb_container *pebc;
+	struct ssdfs_block_bmap *blk_bmap = NULL;
+	int bmap_state = SSDFS_PEB_BLK_BMAP_STATE_UNKNOWN;
+	bool is_dst_peb_clean = false;
+	u8 flags;
+	u8 type;
+	bool under_migration = false;
+	bool has_ext_ptr = false;
+	bool has_relation = false;
+	u64 old_cno = U64_MAX;
+	u32 last_free_blk;
+	u32 metadata_blks;
+	u32 free_blks;
+	u32 used_blks;
+	u32 invalid_blks;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap || !bmap->parent || !bmap->parent->parent_si);
+	BUG_ON(!bmap->parent->parent_si->peb_array);
+	BUG_ON(!source || !hdr);
+	BUG_ON(ssdfs_page_vector_count(source) == 0);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = bmap->parent->parent_si->fsi;
+	si = bmap->parent->parent_si;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("seg_id %llu, peb_index %u, cno %llu\n",
+		  si->seg_id, bmap->peb_index, cno);
+#else
+	SSDFS_DBG("seg_id %llu, peb_index %u, cno %llu\n",
+		  si->seg_id, bmap->peb_index, cno);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	bmap_state = atomic_read(&bmap->state);
+	switch (bmap_state) {
+	case SSDFS_PEB_BLK_BMAP_CREATED:
+		/* regular init */
+		break;
+
+	case SSDFS_PEB_BLK_BMAP_HAS_CLEAN_DST:
+		/*
+		 * PEB container is under migration.
+		 * But the destination PEB is clean.
+		 * It means that destination PEB doesn't need
+		 * in init operation.
+		 */
+		is_dst_peb_clean = true;
+		break;
+
+	default:
+		SSDFS_ERR("invalid PEB block bitmap state %#x\n",
+			  atomic_read(&bmap->state));
+		return -ERANGE;
+	}
+
+	if (bmap->peb_index >= si->pebs_count) {
+		SSDFS_ERR("peb_index %u >= pebs_count %u\n",
+			  bmap->peb_index, si->pebs_count);
+		return -ERANGE;
+	}
+
+	pebc = &si->peb_array[bmap->peb_index];
+
+	flags = hdr->flags;
+	type = hdr->type;
+
+	if (flags & ~SSDFS_FRAG_BLK_BMAP_FLAG_MASK) {
+		SSDFS_ERR("invalid flags set: %#x\n", flags);
+		return -EIO;
+	}
+
+	if (type >= SSDFS_FRAG_BLK_BMAP_TYPE_MAX) {
+		SSDFS_ERR("invalid type: %#x\n", type);
+		return -EIO;
+	}
+
+	if (is_dst_peb_clean) {
+		under_migration = true;
+		has_relation = true;
+	} else {
+		under_migration = flags & SSDFS_MIGRATING_BLK_BMAP;
+		has_ext_ptr = flags & SSDFS_PEB_HAS_EXT_PTR;
+		has_relation = flags & SSDFS_PEB_HAS_RELATION;
+	}
+
+	if (type == SSDFS_SRC_BLK_BMAP && (has_ext_ptr && has_relation)) {
+		SSDFS_ERR("invalid flags set: %#x\n", flags);
+		return -EIO;
+	}
+
+	down_write(&bmap->lock);
+
+	old_cno = bmap->init_cno;
+	if (bmap->init_cno == U64_MAX)
+		bmap->init_cno = cno;
+	else if (bmap->init_cno != cno) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid bmap state: "
+			  "bmap->init_cno %llu, cno %llu\n",
+			  bmap->init_cno, cno);
+		goto fail_init_blk_bmap;
+	}
+
+	switch (type) {
+	case SSDFS_SRC_BLK_BMAP:
+		if (under_migration && has_relation) {
+			if (is_dst_peb_clean)
+				bmap->dst = &bmap->buffer[SSDFS_PEB_BLK_BMAP2];
+			bmap->src = &bmap->buffer[SSDFS_PEB_BLK_BMAP1];
+			blk_bmap = bmap->src;
+			atomic_set(&bmap->buffers_state,
+				    SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST);
+		} else if (under_migration && has_ext_ptr) {
+			bmap->src = &bmap->buffer[SSDFS_PEB_BLK_BMAP1];
+			blk_bmap = bmap->src;
+			atomic_set(&bmap->buffers_state,
+				    SSDFS_PEB_BMAP1_SRC);
+		} else if (under_migration) {
+			err = -EIO;
+			SSDFS_ERR("invalid flags set: %#x\n", flags);
+			goto fail_init_blk_bmap;
+		} else {
+			bmap->src = &bmap->buffer[SSDFS_PEB_BLK_BMAP1];
+			blk_bmap = bmap->src;
+			atomic_set(&bmap->buffers_state,
+				    SSDFS_PEB_BMAP1_SRC);
+		}
+		break;
+
+	case SSDFS_DST_BLK_BMAP:
+		if (under_migration && has_relation) {
+			bmap->dst = &bmap->buffer[SSDFS_PEB_BLK_BMAP2];
+			blk_bmap = bmap->dst;
+			atomic_set(&bmap->buffers_state,
+				    SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST);
+		} else if (under_migration && has_ext_ptr) {
+			bmap->src = &bmap->buffer[SSDFS_PEB_BLK_BMAP1];
+			blk_bmap = bmap->src;
+			atomic_set(&bmap->buffers_state,
+				    SSDFS_PEB_BMAP1_SRC);
+		} else {
+			err = -EIO;
+			SSDFS_ERR("invalid flags set: %#x\n", flags);
+			goto fail_init_blk_bmap;
+		}
+		break;
+
+	default:
+		BUG();
+	}
+
+	last_free_blk = le32_to_cpu(hdr->last_free_blk);
+	metadata_blks = le32_to_cpu(hdr->metadata_blks);
+	invalid_blks = le32_to_cpu(hdr->invalid_blks);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_id %llu, peb_index %u, cno %llu, "
+		  "last_free_blk %u, metadata_blks %u, invalid_blks %u\n",
+		  si->seg_id, bmap->peb_index, cno,
+		  last_free_blk, metadata_blks, invalid_blks);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_block_bmap_lock(blk_bmap);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to lock bitmap: err %d\n", err);
+		goto fail_init_blk_bmap;
+	}
+
+	err = ssdfs_block_bmap_init(blk_bmap, source, last_free_blk,
+				    metadata_blks, invalid_blks);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to initialize block bitmap: "
+			  "err %d\n", err);
+		goto fail_define_pages_count;
+	}
+
+	err = ssdfs_block_bmap_get_free_pages(blk_bmap);
+	if (unlikely(err < 0)) {
+		SSDFS_ERR("fail to get free pages: err %d\n", err);
+		goto fail_define_pages_count;
+	} else {
+		free_blks = err;
+		err = 0;
+	}
+
+	err = ssdfs_block_bmap_get_used_pages(blk_bmap);
+	if (unlikely(err < 0)) {
+		SSDFS_ERR("fail to get used pages: err %d\n", err);
+		goto fail_define_pages_count;
+	} else {
+		used_blks = err;
+		err = 0;
+	}
+
+	err = ssdfs_block_bmap_get_invalid_pages(blk_bmap);
+	if (unlikely(err < 0)) {
+		SSDFS_ERR("fail to get invalid pages: err %d\n", err);
+		goto fail_define_pages_count;
+	} else {
+		invalid_blks = err;
+		err = 0;
+	}
+
+fail_define_pages_count:
+	ssdfs_block_bmap_unlock(blk_bmap);
+
+	if (unlikely(err))
+		goto fail_init_blk_bmap;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_id %llu, peb_index %u, cno %llu, "
+		  "type %#x, under_migration %#x, has_relation %#x, "
+		  "last_free_blk %u, metadata_blks %u, "
+		  "free_blks %u, used_blks %u, "
+		  "invalid_blks %u, shared_free_dst_blks %d\n",
+		  si->seg_id, bmap->peb_index, cno,
+		  type, under_migration, has_relation,
+		  last_free_blk, metadata_blks,
+		  free_blks, used_blks, invalid_blks,
+		  atomic_read(&pebc->shared_free_dst_blks));
+	SSDFS_DBG("seg_id %llu, peb_index %u, cno %llu, "
+		  "free_blks %d, valid_blks %d, invalid_blks %d\n",
+		  si->seg_id, bmap->peb_index, cno,
+		  atomic_read(&bmap->peb_free_blks),
+		  atomic_read(&bmap->peb_valid_blks),
+		  atomic_read(&bmap->peb_invalid_blks));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (type) {
+	case SSDFS_SRC_BLK_BMAP:
+		if (is_dst_peb_clean && !(flags & SSDFS_MIGRATING_BLK_BMAP)) {
+			down_write(&bmap->modification_lock);
+			atomic_set(&bmap->peb_valid_blks, used_blks);
+			atomic_add(fsi->pages_per_peb - used_blks,
+					&bmap->peb_free_blks);
+			up_write(&bmap->modification_lock);
+
+			atomic_set(&pebc->shared_free_dst_blks,
+					fsi->pages_per_peb - used_blks);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("SRC: seg_id %llu, peb_index %u, cno %llu, "
+				  "pages_per_peb %u, used_blks %u, "
+				  "shared_free_dst_blks %d\n",
+				  si->seg_id, bmap->peb_index, cno,
+				  fsi->pages_per_peb, used_blks,
+				  atomic_read(&pebc->shared_free_dst_blks));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			down_write(&bmap->parent->modification_lock);
+			atomic_add(atomic_read(&bmap->peb_valid_blks),
+				   &bmap->parent->seg_valid_blks);
+			atomic_add(atomic_read(&bmap->peb_free_blks),
+				   &bmap->parent->seg_free_blks);
+			up_write(&bmap->parent->modification_lock);
+		} else if (under_migration && has_relation) {
+			int current_free_blks =
+				atomic_read(&bmap->peb_free_blks);
+
+			if (used_blks > current_free_blks) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("used_blks %u > free_blks %d\n",
+					  used_blks, current_free_blks);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+				down_write(&bmap->modification_lock);
+				atomic_set(&bmap->peb_free_blks, 0);
+				atomic_add(used_blks, &bmap->peb_valid_blks);
+				up_write(&bmap->modification_lock);
+
+				atomic_set(&pebc->shared_free_dst_blks, 0);
+
+				down_write(&bmap->parent->modification_lock);
+				atomic_sub(current_free_blks,
+					   &bmap->parent->seg_free_blks);
+				atomic_add(used_blks,
+					   &bmap->parent->seg_valid_blks);
+				up_write(&bmap->parent->modification_lock);
+			} else {
+				down_write(&bmap->modification_lock);
+				atomic_sub(used_blks, &bmap->peb_free_blks);
+				atomic_add(used_blks, &bmap->peb_valid_blks);
+				up_write(&bmap->modification_lock);
+
+				atomic_sub(used_blks,
+					   &pebc->shared_free_dst_blks);
+
+				down_write(&bmap->parent->modification_lock);
+				atomic_sub(used_blks,
+					   &bmap->parent->seg_free_blks);
+				atomic_add(used_blks,
+					   &bmap->parent->seg_valid_blks);
+				up_write(&bmap->parent->modification_lock);
+			}
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("shared_free_dst_blks %d\n",
+				  atomic_read(&pebc->shared_free_dst_blks));
+#endif /* CONFIG_SSDFS_DEBUG */
+		} else if (under_migration && has_ext_ptr) {
+			down_write(&bmap->modification_lock);
+			atomic_add(used_blks, &bmap->peb_valid_blks);
+			atomic_add(invalid_blks, &bmap->peb_invalid_blks);
+			atomic_add(free_blks, &bmap->peb_free_blks);
+			up_write(&bmap->modification_lock);
+		} else if (under_migration) {
+			err = -EIO;
+			SSDFS_ERR("invalid flags set: %#x\n", flags);
+			goto fail_init_blk_bmap;
+		} else {
+			down_write(&bmap->modification_lock);
+			atomic_set(&bmap->peb_valid_blks, used_blks);
+			atomic_set(&bmap->peb_invalid_blks, invalid_blks);
+			atomic_set(&bmap->peb_free_blks, free_blks);
+			up_write(&bmap->modification_lock);
+
+			down_write(&bmap->parent->modification_lock);
+			atomic_add(atomic_read(&bmap->peb_valid_blks),
+				   &bmap->parent->seg_valid_blks);
+			atomic_add(atomic_read(&bmap->peb_invalid_blks),
+				   &bmap->parent->seg_invalid_blks);
+			atomic_add(atomic_read(&bmap->peb_free_blks),
+				   &bmap->parent->seg_free_blks);
+			up_write(&bmap->parent->modification_lock);
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("SRC: seg_id %llu, peb_index %u, cno %llu, "
+			  "free_blks %d, valid_blks %d, invalid_blks %d, "
+			  "parent (used_blks %d, free_blks %d, invalid_blks %d)\n",
+			  si->seg_id, bmap->peb_index, cno,
+			  atomic_read(&bmap->peb_free_blks),
+			  atomic_read(&bmap->peb_valid_blks),
+			  atomic_read(&bmap->peb_invalid_blks),
+			  atomic_read(&bmap->parent->seg_valid_blks),
+			  atomic_read(&bmap->parent->seg_free_blks),
+			  atomic_read(&bmap->parent->seg_invalid_blks));
+#endif /* CONFIG_SSDFS_DEBUG */
+		break;
+
+	case SSDFS_DST_BLK_BMAP:
+		if (under_migration) {
+			down_write(&bmap->modification_lock);
+			atomic_add(used_blks, &bmap->peb_valid_blks);
+			atomic_add(invalid_blks, &bmap->peb_invalid_blks);
+			atomic_add(free_blks, &bmap->peb_free_blks);
+			up_write(&bmap->modification_lock);
+
+			atomic_add(free_blks, &pebc->shared_free_dst_blks);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("DST: seg_id %llu, peb_index %u, cno %llu, "
+				  "free_blks %u, "
+				  "shared_free_dst_blks %d\n",
+				  si->seg_id, bmap->peb_index, cno,
+				  free_blks,
+				  atomic_read(&pebc->shared_free_dst_blks));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			down_write(&bmap->parent->modification_lock);
+			atomic_add(used_blks,
+				   &bmap->parent->seg_valid_blks);
+			atomic_add(invalid_blks,
+				   &bmap->parent->seg_invalid_blks);
+			atomic_add(free_blks,
+				   &bmap->parent->seg_free_blks);
+			up_write(&bmap->parent->modification_lock);
+		} else {
+			err = -EIO;
+			SSDFS_ERR("invalid flags set: %#x\n", flags);
+			goto fail_init_blk_bmap;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("DST: seg_id %llu, peb_index %u, cno %llu, "
+			  "free_blks %d, valid_blks %d, invalid_blks %d, "
+			  "parent (used_blks %d, free_blks %d, invalid_blks %d)\n",
+			  si->seg_id, bmap->peb_index, cno,
+			  atomic_read(&bmap->peb_free_blks),
+			  atomic_read(&bmap->peb_valid_blks),
+			  atomic_read(&bmap->peb_invalid_blks),
+			  atomic_read(&bmap->parent->seg_valid_blks),
+			  atomic_read(&bmap->parent->seg_free_blks),
+			  atomic_read(&bmap->parent->seg_invalid_blks));
+#endif /* CONFIG_SSDFS_DEBUG */
+		break;
+
+	default:
+		BUG();
+	}
+
+	switch (type) {
+	case SSDFS_SRC_BLK_BMAP:
+		if (under_migration && has_relation) {
+			if (!bmap->dst)
+				goto finish_init_blk_bmap;
+			else if (!ssdfs_block_bmap_initialized(bmap->dst))
+				goto finish_init_blk_bmap;
+		}
+		break;
+
+	case SSDFS_DST_BLK_BMAP:
+		if (under_migration && has_relation) {
+			if (!bmap->src)
+				goto finish_init_blk_bmap;
+			else if (!ssdfs_block_bmap_initialized(bmap->src))
+				goto finish_init_blk_bmap;
+		}
+		break;
+
+	default:
+		BUG();
+	}
+
+	if (atomic_read(&pebc->shared_free_dst_blks) < 0) {
+		SSDFS_WARN("type %#x, under_migration %#x, has_relation %#x, "
+			   "last_free_blk %u, metadata_blks %u, "
+			   "free_blks %u, used_blks %u, "
+			   "invalid_blks %u, shared_free_dst_blks %d\n",
+			   type, under_migration, has_relation,
+			   last_free_blk, metadata_blks,
+			   free_blks, used_blks, invalid_blks,
+			   atomic_read(&pebc->shared_free_dst_blks));
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_id %llu, peb_index %u, cno %llu, "
+		  "free_blks %d, used_blks %d, invalid_blks %d, "
+		  "shared_free_dst_blks %d\n",
+		  si->seg_id, bmap->peb_index, cno,
+		  atomic_read(&bmap->peb_free_blks),
+		  atomic_read(&bmap->peb_valid_blks),
+		  atomic_read(&bmap->peb_invalid_blks),
+		  atomic_read(&pebc->shared_free_dst_blks));
+	SSDFS_DBG("seg_id %llu, peb_index %u, cno %llu, "
+		  "parent (used_blks %d, free_blks %d, invalid_blks %d)\n",
+		  si->seg_id, bmap->peb_index, cno,
+		  atomic_read(&bmap->parent->seg_valid_blks),
+		  atomic_read(&bmap->parent->seg_free_blks),
+		  atomic_read(&bmap->parent->seg_invalid_blks));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	atomic_set(&bmap->state, SSDFS_PEB_BLK_BMAP_INITIALIZED);
+	complete_all(&bmap->init_end);
+
+fail_init_blk_bmap:
+	if (unlikely(err)) {
+		bmap->init_cno = old_cno;
+		complete_all(&bmap->init_end);
+	}
+
+finish_init_blk_bmap:
+	up_write(&bmap->lock);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("finished: err %d\n", err);
+#else
+	SSDFS_DBG("finished: err %d\n", err);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	return err;
+}
+
+/*
+ * ssdfs_peb_blk_bmap_init_failed() - process failure of block bitmap init
+ * @bmap: pointer on PEB's block bitmap object
+ */
+void ssdfs_peb_blk_bmap_init_failed(struct ssdfs_peb_blk_bmap *bmap)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	complete_all(&bmap->init_end);
+}
+
+/*
+ * is_ssdfs_peb_blk_bmap_dirty() - check that PEB block bitmap is dirty
+ * @bmap: pointer on PEB's block bitmap object
+ */
+bool is_ssdfs_peb_blk_bmap_dirty(struct ssdfs_peb_blk_bmap *bmap)
+{
+	bool is_src_dirty = false;
+	bool is_dst_dirty = false;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!ssdfs_peb_blk_bmap_initialized(bmap))
+		return false;
+
+	down_read(&bmap->lock);
+	if (bmap->src != NULL)
+		is_src_dirty = ssdfs_block_bmap_dirtied(bmap->src);
+	if (bmap->dst != NULL)
+		is_dst_dirty = ssdfs_block_bmap_dirtied(bmap->dst);
+	up_read(&bmap->lock);
+
+	return is_src_dirty || is_dst_dirty;
+}
+
+/*
+ * ssdfs_peb_define_reserved_pages_per_log() - estimate reserved pages per log
+ * @bmap: pointer on PEB's block bitmap object
+ */
+int ssdfs_peb_define_reserved_pages_per_log(struct ssdfs_peb_blk_bmap *bmap)
+{
+	struct ssdfs_segment_blk_bmap *parent = bmap->parent;
+	struct ssdfs_segment_info *si = parent->parent_si;
+	struct ssdfs_fs_info *fsi = si->fsi;
+	u32 page_size = fsi->pagesize;
+	u32 pages_per_peb = parent->pages_per_peb;
+	u32 pebs_per_seg = fsi->pebs_per_seg;
+	u16 log_pages = si->log_pages;
+	bool is_migrating = false;
+
+	switch (atomic_read(&bmap->buffers_state)) {
+	case SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST:
+	case SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST:
+		is_migrating = true;
+		break;
+
+	default:
+		is_migrating = false;
+		break;
+	}
+
+	return ssdfs_peb_estimate_reserved_metapages(page_size,
+						     pages_per_peb,
+						     log_pages,
+						     pebs_per_seg,
+						     is_migrating);
+}
+
+bool has_ssdfs_peb_blk_bmap_initialized(struct ssdfs_peb_blk_bmap *bmap)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap || !bmap->parent || !bmap->parent->parent_si);
+
+	SSDFS_DBG("seg_id %llu, peb_index %u\n",
+		  bmap->parent->parent_si->seg_id,
+		  bmap->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return ssdfs_peb_blk_bmap_initialized(bmap);
+}
+
+int ssdfs_peb_blk_bmap_wait_init_end(struct ssdfs_peb_blk_bmap *bmap)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap || !bmap->parent || !bmap->parent->parent_si);
+
+	SSDFS_DBG("seg_id %llu, peb_index %u\n",
+		  bmap->parent->parent_si->seg_id,
+		  bmap->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (ssdfs_peb_blk_bmap_initialized(bmap))
+		return 0;
+	else {
+		err = SSDFS_WAIT_COMPLETION(&bmap->init_end);
+		if (unlikely(err)) {
+			SSDFS_ERR("PEB block bitmap init failed: "
+				  "seg_id %llu, peb_index %u, "
+				  "err %d\n",
+				  bmap->parent->parent_si->seg_id,
+				  bmap->peb_index, err);
+			return err;
+		}
+
+		if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+			err = -ERANGE;
+			SSDFS_ERR("PEB block bitmap init failed: "
+				  "seg_id %llu, peb_index %u, "
+				  "err %d\n",
+				  bmap->parent->parent_si->seg_id,
+				  bmap->peb_index, err);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_peb_blk_bmap_get_free_pages() - determine PEB's free pages count
+ * @bmap: pointer on PEB's block bitmap object
+ *
+ * This function tries to detect PEB's free pages count.
+ *
+ * RETURN:
+ * [success] - count of free pages.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ERANGE     - invalid internal calculations.
+ */
+int ssdfs_peb_blk_bmap_get_free_pages(struct ssdfs_peb_blk_bmap *bmap)
+{
+	int free_pages;
+	int log_pages;
+	int created_logs;
+	int reserved_pages_per_log;
+	int used_pages;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap || !bmap->parent || !bmap->parent->parent_si);
+
+	SSDFS_DBG("seg_id %llu, peb_index %u\n",
+		  bmap->parent->parent_si->seg_id,
+		  bmap->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+		err = SSDFS_WAIT_COMPLETION(&bmap->init_end);
+		if (unlikely(err)) {
+init_failed:
+			SSDFS_ERR("PEB block bitmap init failed: "
+				  "seg_id %llu, peb_index %u, "
+				  "err %d\n",
+				  bmap->parent->parent_si->seg_id,
+				  bmap->peb_index, err);
+			SSDFS_ERR("seg_id %llu, free_logical_blks %u, "
+					  "valid_logical_blks %u, "
+					  "invalid_logical_blks %u, pages_per_peb %u\n",
+					  bmap->parent->parent_si->seg_id,
+					  atomic_read(&bmap->peb_free_blks),
+					  atomic_read(&bmap->peb_valid_blks),
+					  atomic_read(&bmap->peb_invalid_blks),
+					  bmap->pages_per_peb);
+
+			if (bmap->src) {
+				SSDFS_ERR("SRC BLOCK BITMAP: bytes_count %zu, items_count %zu, "
+					  "metadata_items %u, used_blks %u, invalid_blks %u, "
+					  "flags %#x\n",
+					  bmap->src->bytes_count,
+					  bmap->src->items_count,
+					  bmap->src->metadata_items,
+					  bmap->src->used_blks,
+					  bmap->src->invalid_blks,
+					  atomic_read(&bmap->src->flags));
+			}
+
+			if (bmap->dst) {
+				SSDFS_ERR("DST BLOCK BITMAP: bytes_count %zu, items_count %zu, "
+					  "metadata_items %u, used_blks %u, invalid_blks %u, "
+					  "flags %#x\n",
+					  bmap->dst->bytes_count,
+					  bmap->dst->items_count,
+					  bmap->dst->metadata_items,
+					  bmap->dst->used_blks,
+					  bmap->dst->invalid_blks,
+					  atomic_read(&bmap->dst->flags));
+			}
+
+			return err;
+		}
+
+		if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+			err = -ERANGE;
+			goto init_failed;
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_id %llu, free_logical_blks %u, "
+		  "valid_logical_blks %u, "
+		  "invalid_logical_blks %u, pages_per_peb %u\n",
+		  bmap->parent->parent_si->seg_id,
+		  atomic_read(&bmap->peb_free_blks),
+		  atomic_read(&bmap->peb_valid_blks),
+		  atomic_read(&bmap->peb_invalid_blks),
+		  bmap->pages_per_peb);
+
+	if ((atomic_read(&bmap->peb_free_blks) +
+	    atomic_read(&bmap->peb_valid_blks) +
+	    atomic_read(&bmap->peb_invalid_blks)) > bmap->pages_per_peb) {
+		SSDFS_WARN("seg_id %llu, peb_index %u, "
+			   "free_logical_blks %u, valid_logical_blks %u, "
+			   "invalid_logical_blks %u, pages_per_peb %u\n",
+			   bmap->parent->parent_si->seg_id,
+			   bmap->peb_index,
+			   atomic_read(&bmap->peb_free_blks),
+			   atomic_read(&bmap->peb_valid_blks),
+			   atomic_read(&bmap->peb_invalid_blks),
+			   bmap->pages_per_peb);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	log_pages = bmap->parent->parent_si->log_pages;
+	reserved_pages_per_log = ssdfs_peb_define_reserved_pages_per_log(bmap);
+	free_pages = atomic_read(&bmap->peb_free_blks);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("log_pages %d, reserved_pages_per_log %d, "
+		  "free_pages %d\n",
+		  log_pages, reserved_pages_per_log, free_pages);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (free_pages > 0) {
+		int upper_threshold, lower_threshold;
+
+		created_logs = (bmap->pages_per_peb - free_pages) / log_pages;
+		used_pages = bmap->pages_per_peb - free_pages;
+
+		if (created_logs == 0) {
+			upper_threshold = log_pages;
+			lower_threshold = reserved_pages_per_log;
+		} else {
+			upper_threshold = (created_logs + 1) * log_pages;
+			lower_threshold = ((created_logs - 1) * log_pages) +
+					    reserved_pages_per_log;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("created_logs %d, used_pages %d, "
+			  "upper_threshold %d, lower_threshold %d\n",
+			  created_logs, used_pages,
+			  upper_threshold, lower_threshold);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		BUG_ON(used_pages > upper_threshold);
+
+		if (used_pages == upper_threshold)
+			free_pages -= reserved_pages_per_log;
+		else if (used_pages < lower_threshold)
+			free_pages -= (lower_threshold - used_pages);
+
+		if (free_pages < 0)
+			free_pages = 0;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("free_pages %d\n", free_pages);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return free_pages;
+}
+
+/*
+ * ssdfs_peb_blk_bmap_get_used_pages() - determine PEB's used data pages count
+ * @bmap: pointer on PEB's block bitmap object
+ *
+ * This function tries to detect PEB's used data pages count.
+ *
+ * RETURN:
+ * [success] - count of used data pages.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ERANGE     - invalid internal calculations.
+ */
+int ssdfs_peb_blk_bmap_get_used_pages(struct ssdfs_peb_blk_bmap *bmap)
+{
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap);
+
+	SSDFS_DBG("peb_index %u\n", bmap->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+		err = SSDFS_WAIT_COMPLETION(&bmap->init_end);
+		if (unlikely(err)) {
+init_failed:
+			SSDFS_ERR("PEB block bitmap init failed: "
+				  "seg_id %llu, peb_index %u, "
+				  "err %d\n",
+				  bmap->parent->parent_si->seg_id,
+				  bmap->peb_index, err);
+			return err;
+		}
+
+		if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+			err = -ERANGE;
+			goto init_failed;
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("free_logical_blks %u, valid_logical_blks %u, "
+		  "invalid_logical_blks %u, pages_per_peb %u\n",
+		  atomic_read(&bmap->peb_free_blks),
+		  atomic_read(&bmap->peb_valid_blks),
+		  atomic_read(&bmap->peb_invalid_blks),
+		  bmap->pages_per_peb);
+
+	if ((atomic_read(&bmap->peb_free_blks) +
+	    atomic_read(&bmap->peb_valid_blks) +
+	    atomic_read(&bmap->peb_invalid_blks)) > bmap->pages_per_peb) {
+		SSDFS_WARN("seg_id %llu, peb_index %u, "
+			   "free_logical_blks %u, valid_logical_blks %u, "
+			   "invalid_logical_blks %u, pages_per_peb %u\n",
+			   bmap->parent->parent_si->seg_id,
+			   bmap->peb_index,
+			   atomic_read(&bmap->peb_free_blks),
+			   atomic_read(&bmap->peb_valid_blks),
+			   atomic_read(&bmap->peb_invalid_blks),
+			   bmap->pages_per_peb);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return atomic_read(&bmap->peb_valid_blks);
+}
+
+/*
+ * ssdfs_peb_blk_bmap_get_invalid_pages() - determine PEB's invalid pages count
+ * @bmap: pointer on PEB's block bitmap object
+ *
+ * This function tries to detect PEB's invalid pages count.
+ *
+ * RETURN:
+ * [success] - count of invalid pages.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ERANGE     - invalid internal calculations.
+ */
+int ssdfs_peb_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *bmap)
+{
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap);
+
+	SSDFS_DBG("peb_index %u\n", bmap->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+		err = SSDFS_WAIT_COMPLETION(&bmap->init_end);
+		if (unlikely(err)) {
+init_failed:
+			SSDFS_ERR("PEB block bitmap init failed: "
+				  "seg_id %llu, peb_index %u, "
+				  "err %d\n",
+				  bmap->parent->parent_si->seg_id,
+				  bmap->peb_index, err);
+			return err;
+		}
+
+		if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+			err = -ERANGE;
+			goto init_failed;
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("free_logical_blks %u, valid_logical_blks %u, "
+		  "invalid_logical_blks %u, pages_per_peb %u\n",
+		  atomic_read(&bmap->peb_free_blks),
+		  atomic_read(&bmap->peb_valid_blks),
+		  atomic_read(&bmap->peb_invalid_blks),
+		  bmap->pages_per_peb);
+
+	if ((atomic_read(&bmap->peb_free_blks) +
+	    atomic_read(&bmap->peb_valid_blks) +
+	    atomic_read(&bmap->peb_invalid_blks)) > bmap->pages_per_peb) {
+		SSDFS_WARN("seg_id %llu, peb_index %u, "
+			   "free_logical_blks %u, valid_logical_blks %u, "
+			   "invalid_logical_blks %u, pages_per_peb %u\n",
+			   bmap->parent->parent_si->seg_id,
+			   bmap->peb_index,
+			   atomic_read(&bmap->peb_free_blks),
+			   atomic_read(&bmap->peb_valid_blks),
+			   atomic_read(&bmap->peb_invalid_blks),
+			   bmap->pages_per_peb);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return atomic_read(&bmap->peb_invalid_blks);
+}
+
+/*
+ * ssdfs_src_blk_bmap_get_free_pages() - determine free pages count
+ * @bmap: pointer on PEB's block bitmap object
+ *
+ * This function tries to detect the free pages count
+ * in the source bitmap.
+ *
+ * RETURN:
+ * [success] - count of free pages.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ERANGE     - invalid internal calculations.
+ */
+int ssdfs_src_blk_bmap_get_free_pages(struct ssdfs_peb_blk_bmap *bmap)
+{
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap);
+
+	SSDFS_DBG("peb_index %u\n", bmap->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+		err = SSDFS_WAIT_COMPLETION(&bmap->init_end);
+		if (unlikely(err)) {
+init_failed:
+			SSDFS_ERR("PEB block bitmap init failed: "
+				  "seg_id %llu, peb_index %u, "
+				  "err %d\n",
+				  bmap->parent->parent_si->seg_id,
+				  bmap->peb_index, err);
+			return err;
+		}
+
+		if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+			err = -ERANGE;
+			goto init_failed;
+		}
+	}
+
+	down_read(&bmap->lock);
+
+	if (bmap->src == NULL) {
+		err = -ERANGE;
+		SSDFS_WARN("bmap pointer is empty\n");
+		goto finish_get_src_free_pages;
+	}
+
+	err = ssdfs_block_bmap_lock(bmap->src);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to lock block bitmap: err %d\n", err);
+		goto finish_get_src_free_pages;
+	}
+
+	err = ssdfs_block_bmap_get_free_pages(bmap->src);
+	ssdfs_block_bmap_unlock(bmap->src);
+
+finish_get_src_free_pages:
+	up_read(&bmap->lock);
+
+	return err;
+}
+
+/*
+ * ssdfs_src_blk_bmap_get_used_pages() - determine used pages count
+ * @bmap: pointer on PEB's block bitmap object
+ *
+ * This function tries to detect the used pages count
+ * in the source bitmap.
+ *
+ * RETURN:
+ * [success] - count of used pages.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ERANGE     - invalid internal calculations.
+ */
+int ssdfs_src_blk_bmap_get_used_pages(struct ssdfs_peb_blk_bmap *bmap)
+{
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap);
+
+	SSDFS_DBG("peb_index %u\n", bmap->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+		err = SSDFS_WAIT_COMPLETION(&bmap->init_end);
+		if (unlikely(err)) {
+init_failed:
+			SSDFS_ERR("PEB block bitmap init failed: "
+				  "seg_id %llu, peb_index %u, "
+				  "err %d\n",
+				  bmap->parent->parent_si->seg_id,
+				  bmap->peb_index, err);
+			return err;
+		}
+
+		if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+			err = -ERANGE;
+			goto init_failed;
+		}
+	}
+
+	down_read(&bmap->lock);
+
+	if (bmap->src == NULL) {
+		err = -ERANGE;
+		SSDFS_WARN("bmap pointer is empty\n");
+		goto finish_get_src_used_pages;
+	}
+
+	err = ssdfs_block_bmap_lock(bmap->src);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to lock block bitmap: err %d\n", err);
+		goto finish_get_src_used_pages;
+	}
+
+	err = ssdfs_block_bmap_get_used_pages(bmap->src);
+	ssdfs_block_bmap_unlock(bmap->src);
+
+finish_get_src_used_pages:
+	up_read(&bmap->lock);
+
+	return err;
+}
+
+/*
+ * ssdfs_src_blk_bmap_get_invalid_pages() - determine invalid pages count
+ * @bmap: pointer on PEB's block bitmap object
+ *
+ * This function tries to detect the invalid pages count
+ * in the source bitmap.
+ *
+ * RETURN:
+ * [success] - count of invalid pages.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ERANGE     - invalid internal calculations.
+ */
+int ssdfs_src_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *bmap)
+{
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap);
+
+	SSDFS_DBG("peb_index %u\n", bmap->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+		err = SSDFS_WAIT_COMPLETION(&bmap->init_end);
+		if (unlikely(err)) {
+init_failed:
+			SSDFS_ERR("PEB block bitmap init failed: "
+				  "seg_id %llu, peb_index %u, "
+				  "err %d\n",
+				  bmap->parent->parent_si->seg_id,
+				  bmap->peb_index, err);
+			return err;
+		}
+
+		if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+			err = -ERANGE;
+			goto init_failed;
+		}
+	}
+
+	down_read(&bmap->lock);
+
+	if (bmap->src == NULL) {
+		err = -ERANGE;
+		SSDFS_WARN("bmap pointer is empty\n");
+		goto finish_get_src_invalid_pages;
+	}
+
+	err = ssdfs_block_bmap_lock(bmap->src);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to lock block bitmap: err %d\n", err);
+		goto finish_get_src_invalid_pages;
+	}
+
+	err = ssdfs_block_bmap_get_invalid_pages(bmap->src);
+	ssdfs_block_bmap_unlock(bmap->src);
+
+finish_get_src_invalid_pages:
+	up_read(&bmap->lock);
+
+	return err;
+}
+
+/*
+ * ssdfs_dst_blk_bmap_get_free_pages() - determine free pages count
+ * @bmap: pointer on PEB's block bitmap object
+ *
+ * This function tries to detect the free pages count
+ * in the destination bitmap.
+ *
+ * RETURN:
+ * [success] - count of free pages.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ERANGE     - invalid internal calculations.
+ */
+int ssdfs_dst_blk_bmap_get_free_pages(struct ssdfs_peb_blk_bmap *bmap)
+{
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap);
+
+	SSDFS_DBG("peb_index %u\n", bmap->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+		err = SSDFS_WAIT_COMPLETION(&bmap->init_end);
+		if (unlikely(err)) {
+init_failed:
+			SSDFS_ERR("PEB block bitmap init failed: "
+				  "seg_id %llu, peb_index %u, "
+				  "err %d\n",
+				  bmap->parent->parent_si->seg_id,
+				  bmap->peb_index, err);
+			return err;
+		}
+
+		if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+			err = -ERANGE;
+			goto init_failed;
+		}
+	}
+
+	down_read(&bmap->lock);
+
+	if (bmap->dst == NULL) {
+		err = -ERANGE;
+		SSDFS_WARN("bmap pointer is empty\n");
+		goto finish_get_dst_free_pages;
+	}
+
+	err = ssdfs_block_bmap_lock(bmap->dst);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to lock block bitmap: err %d\n", err);
+		goto finish_get_dst_free_pages;
+	}
+
+	err = ssdfs_block_bmap_get_free_pages(bmap->dst);
+	ssdfs_block_bmap_unlock(bmap->dst);
+
+finish_get_dst_free_pages:
+	up_read(&bmap->lock);
+
+	return err;
+}
+
+/*
+ * ssdfs_dst_blk_bmap_get_used_pages() - determine used pages count
+ * @bmap: pointer on PEB's block bitmap object
+ *
+ * This function tries to detect the used pages count
+ * in the destination bitmap.
+ *
+ * RETURN:
+ * [success] - count of used pages.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ERANGE     - invalid internal calculations.
+ */
+int ssdfs_dst_blk_bmap_get_used_pages(struct ssdfs_peb_blk_bmap *bmap)
+{
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap);
+
+	SSDFS_DBG("peb_index %u\n", bmap->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+		err = SSDFS_WAIT_COMPLETION(&bmap->init_end);
+		if (unlikely(err)) {
+init_failed:
+			SSDFS_ERR("PEB block bitmap init failed: "
+				  "seg_id %llu, peb_index %u, "
+				  "err %d\n",
+				  bmap->parent->parent_si->seg_id,
+				  bmap->peb_index, err);
+			return err;
+		}
+
+		if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+			err = -ERANGE;
+			goto init_failed;
+		}
+	}
+
+	down_read(&bmap->lock);
+
+	if (bmap->dst == NULL) {
+		err = -ERANGE;
+		SSDFS_WARN("bmap pointer is empty\n");
+		goto finish_get_dst_used_pages;
+	}
+
+	err = ssdfs_block_bmap_lock(bmap->dst);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to lock block bitmap: err %d\n", err);
+		goto finish_get_dst_used_pages;
+	}
+
+	err = ssdfs_block_bmap_get_used_pages(bmap->dst);
+	ssdfs_block_bmap_unlock(bmap->dst);
+
+finish_get_dst_used_pages:
+	up_read(&bmap->lock);
+
+	return err;
+}
+
+/*
+ * ssdfs_dst_blk_bmap_get_invalid_pages() - determine invalid pages count
+ * @bmap: pointer on PEB's block bitmap object
+ *
+ * This function tries to detect the invalid pages count
+ * in the destination bitmap.
+ *
+ * RETURN:
+ * [success] - count of invalid pages.
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input value.
+ * %-ERANGE     - invalid internal calculations.
+ */
+int ssdfs_dst_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *bmap)
+{
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bmap);
+
+	SSDFS_DBG("peb_index %u\n", bmap->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+		err = SSDFS_WAIT_COMPLETION(&bmap->init_end);
+		if (unlikely(err)) {
+init_failed:
+			SSDFS_ERR("PEB block bitmap init failed: "
+				  "seg_id %llu, peb_index %u, "
+				  "err %d\n",
+				  bmap->parent->parent_si->seg_id,
+				  bmap->peb_index, err);
+			return err;
+		}
+
+		if (!ssdfs_peb_blk_bmap_initialized(bmap)) {
+			err = -ERANGE;
+			goto init_failed;
+		}
+	}
+
+	down_read(&bmap->lock);
+
+	if (bmap->dst == NULL) {
+		err = -ERANGE;
+		SSDFS_WARN("bmap pointer is empty\n");
+		goto finish_get_dst_invalid_pages;
+	}
+
+	err = ssdfs_block_bmap_lock(bmap->dst);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to lock block bitmap: err %d\n", err);
+		goto finish_get_dst_invalid_pages;
+	}
+
+	err = ssdfs_block_bmap_get_invalid_pages(bmap->dst);
+	ssdfs_block_bmap_unlock(bmap->dst);
+
+finish_get_dst_invalid_pages:
+	up_read(&bmap->lock);
+
+	return err;
+}
diff --git a/fs/ssdfs/peb_block_bitmap.h b/fs/ssdfs/peb_block_bitmap.h
new file mode 100644
index 000000000000..7cbeebe1a59e
--- /dev/null
+++ b/fs/ssdfs/peb_block_bitmap.h
@@ -0,0 +1,165 @@
+// SPDX-License-Identifier: BSD-3-Clause-Clear
+/*
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/peb_block_bitmap.h - PEB's block bitmap declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2023 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_PEB_BLOCK_BITMAP_H
+#define _SSDFS_PEB_BLOCK_BITMAP_H
+
+#include "block_bitmap.h"
+
+/* PEB's block bitmap indexes */
+enum {
+	SSDFS_PEB_BLK_BMAP1,
+	SSDFS_PEB_BLK_BMAP2,
+	SSDFS_PEB_BLK_BMAP_ITEMS_MAX
+};
+
+/*
+ * struct ssdfs_peb_blk_bmap - PEB container's block bitmap object
+ * @state: PEB container's block bitmap's state
+ * @peb_index: PEB index in array
+ * @pages_per_peb: pages per physical erase block
+ * @modification_lock: lock for modification operations
+ * @peb_valid_blks: PEB container's valid logical blocks count
+ * @peb_invalid_blks: PEB container's invalid logical blocks count
+ * @peb_free_blks: PEB container's free logical blocks count
+ * @buffers_state: buffers state
+ * @lock: buffers lock
+ * @init_cno: initialization checkpoint
+ * @src: source PEB's block bitmap object's pointer
+ * @dst: destination PEB's block bitmap object's pointer
+ * @buffers: block bitmap buffers
+ * @init_end: wait of init ending
+ * @parent: pointer on parent segment block bitmap
+ */
+struct ssdfs_peb_blk_bmap {
+	atomic_t state;
+
+	u16 peb_index;
+	u32 pages_per_peb;
+
+	struct rw_semaphore modification_lock;
+	atomic_t peb_valid_blks;
+	atomic_t peb_invalid_blks;
+	atomic_t peb_free_blks;
+
+	atomic_t buffers_state;
+	struct rw_semaphore lock;
+	u64 init_cno;
+	struct ssdfs_block_bmap *src;
+	struct ssdfs_block_bmap *dst;
+	struct ssdfs_block_bmap buffer[SSDFS_PEB_BLK_BMAP_ITEMS_MAX];
+	struct completion init_end;
+
+	struct ssdfs_segment_blk_bmap *parent;
+};
+
+/* PEB container's block bitmap's possible states */
+enum {
+	SSDFS_PEB_BLK_BMAP_STATE_UNKNOWN,
+	SSDFS_PEB_BLK_BMAP_CREATED,
+	SSDFS_PEB_BLK_BMAP_HAS_CLEAN_DST,
+	SSDFS_PEB_BLK_BMAP_INITIALIZED,
+	SSDFS_PEB_BLK_BMAP_STATE_MAX,
+};
+
+/* PEB's buffer array possible states */
+enum {
+	SSDFS_PEB_BMAP_BUFFERS_EMPTY,
+	SSDFS_PEB_BMAP1_SRC,
+	SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST,
+	SSDFS_PEB_BMAP2_SRC,
+	SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST,
+	SSDFS_PEB_BMAP_BUFFERS_STATE_MAX
+};
+
+/* PEB's block bitmap operation destination */
+enum {
+	SSDFS_PEB_BLK_BMAP_SOURCE,
+	SSDFS_PEB_BLK_BMAP_DESTINATION,
+	SSDFS_PEB_BLK_BMAP_INDEX_MAX
+};
+
+/*
+ * PEB block bitmap API
+ */
+int ssdfs_peb_blk_bmap_create(struct ssdfs_segment_blk_bmap *parent,
+			      u16 peb_index, u32 items_count,
+			      int init_flag, int init_state);
+void ssdfs_peb_blk_bmap_destroy(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_peb_blk_bmap_init(struct ssdfs_peb_blk_bmap *bmap,
+			    struct ssdfs_page_vector *source,
+			    struct ssdfs_block_bitmap_fragment *hdr,
+			    u64 cno);
+void ssdfs_peb_blk_bmap_init_failed(struct ssdfs_peb_blk_bmap *bmap);
+
+bool has_ssdfs_peb_blk_bmap_initialized(struct ssdfs_peb_blk_bmap *bmap);
+int ssdfs_peb_blk_bmap_wait_init_end(struct ssdfs_peb_blk_bmap *bmap);
+
+bool ssdfs_peb_blk_bmap_initialized(struct ssdfs_peb_blk_bmap *ptr);
+bool is_ssdfs_peb_blk_bmap_dirty(struct ssdfs_peb_blk_bmap *ptr);
+
+int ssdfs_peb_blk_bmap_get_free_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_peb_blk_bmap_get_used_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_peb_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *ptr);
+
+int ssdfs_peb_define_reserved_pages_per_log(struct ssdfs_peb_blk_bmap *bmap);
+int ssdfs_peb_blk_bmap_reserve_metapages(struct ssdfs_peb_blk_bmap *bmap,
+					 int bmap_index,
+					 u32 count);
+int ssdfs_peb_blk_bmap_free_metapages(struct ssdfs_peb_blk_bmap *bmap,
+				      int bmap_index,
+				      u32 count);
+int ssdfs_peb_blk_bmap_pre_allocate(struct ssdfs_peb_blk_bmap *bmap,
+				    int bmap_index,
+				    u32 *len,
+				    struct ssdfs_block_bmap_range *range);
+int ssdfs_peb_blk_bmap_allocate(struct ssdfs_peb_blk_bmap *bmap,
+				int bmap_index,
+				u32 *len,
+				struct ssdfs_block_bmap_range *range);
+int ssdfs_peb_blk_bmap_invalidate(struct ssdfs_peb_blk_bmap *bmap,
+				  int bmap_index,
+				  struct ssdfs_block_bmap_range *range);
+int ssdfs_peb_blk_bmap_update_range(struct ssdfs_peb_blk_bmap *bmap,
+				    int bmap_index,
+				    int new_range_state,
+				    struct ssdfs_block_bmap_range *range);
+int ssdfs_peb_blk_bmap_collect_garbage(struct ssdfs_peb_blk_bmap *bmap,
+