linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zhi Yong Wu <zwu.kernel@gmail.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, dave@linux.vnet.ibm.com,
	viro@zeniv.linux.org.uk, hch@lst.de, chris.mason@fusionio.com,
	cmm@us.ibm.com, linuxram@us.ibm.com,
	aneesh.kumar@linux.vnet.ibm.com, tytso@mit.edu,
	Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Subject: Re: [RFC 00/11] VFS: hot data tracking
Date: Wed, 12 Sep 2012 22:31:27 +0800	[thread overview]
Message-ID: <CAEH94LgarUjzAY7iKeuztaQ3ZpPS-=v0wqBHZ=vqS4QvMg93Tg@mail.gmail.com> (raw)
In-Reply-To: <1347373645-2119-1-git-send-email-zwu.kernel@gmail.com>

Sorry, forgot CCed to Ted.

On Tue, Sep 11, 2012 at 10:27 PM,  <zwu.kernel@gmail.com> wrote:
> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>
> HI, folks
>   I have pushed the patchset to my kernel dev git tree:
> git@github.com:wuzhy/kernel.git
>
>   Also, you can review it via
> https://github.com/wuzhy/kernel/commits/hottrack
>
> NOTE:
>
> The patchset still has a lot of bugfix and cleanup to do. It is post
> out mainly to make sure it is going in the correct direction and
> hope to get some helpful comments from other guys.
>
> TODO List:
>
>  1.) Need to do scalability or performance tests.
>  2.) Fix up bugs.
>  3.) Strictly split this patchset to keep them in order
>         This patchset is in RFC state, i haven't strictly split it
>      When it is in PATCH state, i will strictly split it and let
>      them in order.
>  4.) Turn some Micro in to tunables
>         TIME_TO_KICK, and HEAT_UPDATE_DELAY
>  5.) Rafactor hot_hash_is_aging()
>         If you just made the timeout value a timespec and compared
>      the _timespecs_, you would be doing a lot fewer conversions.
>  6.) Cleanup some unnecessary lock protect
>  7.) Add more comments to explain how to calc temperature
>
> Ben Chociej, Matt Lupfer and Conor Scott originally wrote this code to
>  be very btrfs-specific.  I've taken their code and attempted to
> make it more generic and integrate it at the VFS level.
>
> INTRODUCTION:
>
> Essentially, this means maintaining some key stats
> (like number of reads/writes, last read/write time, frequency of
> reads/writes), then distilling those numbers down to a single
> "temperature" value that reflects what data is "hot," and using that
> temperature to move data to SSDs.
>
> The long-term goal of these patches is to allow some FSs,
> e.g. Btrfs to intelligently utilize SSDs in a heterogenous volume.
> Incidentally, this project has been motivated by
> the Project Ideas page on the Btrfs wiki.
>
> Of course, users are warned not to run this code outside of development
> environments. These patches are EXPERIMENTAL, and as such they might eat
> your data and/or memory. That said, the code should be relatively safe
> when the hottrack mount option are disabled.
>
> MOTIVATION:
>
> The overall goal of enabling hot data relocation to SSD has been
> motivated by the Project Ideas page on the Btrfs wiki at
> <https://btrfs.wiki.kernel.org/index.php/Project_ideas>.
> It will divide into two steps. VFS provide hot data tracking function
> while specific FS will provide hot data relocation function.
> So as the first step of this goal, it is hoped that the patchset
> for hot data tracking will eventually mature into VFS.
>
> This is essentially the traditional cache argument: SSD is fast and
> expensive; HDD is cheap but slow. ZFS, for example, can already take
> advantage of SSD caching. Btrfs should also be able to take advantage of
> hybrid storage without many broad, sweeping changes to existing code.
>
> SUMMARY:
>
> - Hooks in existing vfs functions to track data access frequency
>
> - New rbtrees for tracking access frequency of inodes and sub-file
> ranges (hot_rb.c)
>     The relationship between super_block and rbtree is as below:
>   super_block->s_hotinfo.hot_inode_tree
>     In include/linux/fs.h, one struct hot_info s_hotinfo is added to
>   super_block struct. Each FS instance can find hot tracking info
>   s_hotinfo via its super_block. In this hot_info, it store a lot of hot
>   tracking info such as hot_inode_tree, inode and range hash list, etc.
>
> - A hash list for indexing data by its temperature (hot_hash.c)
>
> - A debugfs interface for dumping data from the rbtrees (hot_debugfs.c)
>
> - A background kthread for updating inode heat info
>
> - Mount options for enabling temperature tracking(-o hottrack, default mean disabled)
>   (hot_track.c)
>
> - An ioctl to retrieve the frequency information collected for a certain
> file
>
> - Ioctls to enable/disable frequency tracking per inode.
>
> Usage syntax:
>
> root@debian-i386:~# mount -o hottrack /dev/sdb /mnt
> [ 1505.894078] device label test devid 1 transid 29 /dev/sdb
> [ 1505.952977] btrfs: disk space caching is enabled
> [ 1506.069678] vfs: turning on hot data tracking
> root@debian-i386:~# mount -t debugfs none /sys/kernel/debug
> root@debian-i386:~# ls -l /sys/kernel/debug/vfs_hotdata/
> total 0
> drwxr-xr-x 2 root root 0 Aug  8 04:40 sdb
> root@debian-i386:~# ls -l /sys/kernel/debug/vfs_hotdata/sdb
> total 0
> -rw-r--r-- 1 root root 0 Aug  8 04:40 inode_data
> -rw-r--r-- 1 root root 0 Aug  8 04:40 range_data
> root@debian-i386:~# vi /mnt/file
> root@debian-i386:~# cat /sys/kernel/debug/hot_track/sdb/inode_data
> inode #279, reads 0, writes 1, avg read time 18446744073709551615,
> avg write time 5251566408153596, temp 109
> root@debian-i386:~# cat /sys/kernel/debug/hot_track/sdb/range_data
> inode #279, range start 0 (range len 1048576) reads 0, writes 1,
> avg read time 18446744073709551615, avg write time 1128690176623144209, temp 64
> root@debian-i386:~# echo "hot data tracking test" >> /mnt/file
> root@debian-i386:~# cat /sys/kernel/debug/hot_track/sdb/inode_data
> inode #279, reads 0, writes 2, avg read time 18446744073709551615,
> avg write time 4923343766042451, temp 109
> root@debian-i386:~# cat /sys/kernel/debug/hot_track/sdb/range_data
> inode #279, range start 0 (range len 1048576) reads 0, writes 2,
> avg read time 18446744073709551615, avg write time 1058147040842596150, temp 64
> root@debian-i386:~#
>
> Zhi Yong Wu (11):
>   vfs: introduce one structure hot_info
>   vfs: introduce one rb tree - hot_inode_tree
>   vfs: introduce 2 rb tree items - inode and range
>   vfs: add support for updating access frequency
>   vfs: add one new mount option -o hottrack
>   vfs: add init and exit support
>   vfs: introduce one hash table
>   vfs: enable hot data tracking
>   vfs: fork one private kthread to update temperature info
>   vfs: add 3 new ioctl interfaces
>   vfs: add debugfs support
>
>  fs/Makefile               |    3 +-
>  fs/compat_ioctl.c         |    8 +
>  fs/dcache.c               |    2 +
>  fs/direct-io.c            |   10 +
>  fs/hot_debugfs.c          |  488 ++++++++++++++++++++++++++++++++++
>  fs/hot_debugfs.h          |   60 +++++
>  fs/hot_hash.c             |  382 ++++++++++++++++++++++++++
>  fs/hot_hash.h             |  112 ++++++++
>  fs/hot_rb.c               |  648 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/hot_rb.h               |   81 ++++++
>  fs/hot_track.c            |   85 ++++++
>  fs/hot_track.h            |   23 ++
>  fs/ioctl.c                |  132 +++++++++
>  fs/namespace.c            |   10 +
>  fs/super.c                |   11 +
>  include/linux/fs.h        |   15 +
>  include/linux/hot_track.h |  169 ++++++++++++
>  mm/filemap.c              |    8 +
>  mm/page-writeback.c       |   21 ++
>  mm/readahead.c            |    9 +
>  20 files changed, 2276 insertions(+), 1 deletions(-)
>  create mode 100644 fs/hot_debugfs.c
>  create mode 100644 fs/hot_debugfs.h
>  create mode 100644 fs/hot_hash.c
>  create mode 100644 fs/hot_hash.h
>  create mode 100644 fs/hot_rb.c
>  create mode 100644 fs/hot_rb.h
>  create mode 100644 fs/hot_track.c
>  create mode 100644 fs/hot_track.h
>  create mode 100644 include/linux/hot_track.h
>
> --
> 1.7.6.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards,

Zhi Yong Wu

  parent reply	other threads:[~2012-09-12 14:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-11 14:27 [RFC 00/11] VFS: hot data tracking zwu.kernel
2012-09-11 14:27 ` [RFC 01/11] vfs: introduce one structure hot_info zwu.kernel
2012-09-11 14:27 ` [RFC 02/11] vfs: introduce one rb tree - hot_inode_tree zwu.kernel
2012-09-11 14:27 ` [RFC 03/11] vfs: introduce 2 rb tree items - inode and range zwu.kernel
2012-09-11 14:27 ` [RFC 04/11] vfs: add support for updating access frequency zwu.kernel
2012-09-11 14:27 ` [RFC 05/11] vfs: add one new mount option -o hottrack zwu.kernel
2012-09-12 14:31 ` Zhi Yong Wu [this message]
2012-09-14  7:35   ` [RFC 00/11] VFS: hot data tracking Zhi Yong Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEH94LgarUjzAY7iKeuztaQ3ZpPS-=v0wqBHZ=vqS4QvMg93Tg@mail.gmail.com' \
    --to=zwu.kernel@gmail.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=chris.mason@fusionio.com \
    --cc=cmm@us.ibm.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wuzhy@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).