Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Andiry Xu <jix024@eng.ucsd.edu>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	linux-kernel@vger.kernel.org,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Dan Williams <dan.j.williams@intel.com>,
	"Rudoff, Andy" <andy.rudoff@intel.com>,
	coughlan@redhat.com, Steven Swanson <swanson@cs.ucsd.edu>,
	Dave Chinner <david@fromorbit.com>,
	jack@suse.com, swhiteho@redhat.com, miklos@szeredi.hu,
	Jian Xu <andiry.xu@gmail.com>, Andiry Xu <jix024@cs.ucsd.edu>
Subject: Re: [RFC v2 04/83] NOVA inode definition.
Date: Wed, 14 Mar 2018 23:16:15 -0700
Message-ID: <CAD4SzjvseYcy0n7xcHtpQQM_+zHMPYXzn_UXBfk81dfV76CQug@mail.gmail.com> (raw)
In-Reply-To: <20180315050653.GC4860@magnolia>

On Wed, Mar 14, 2018 at 10:06 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Sat, Mar 10, 2018 at 10:17:45AM -0800, Andiry Xu wrote:
>> From: Andiry Xu <jix024@cs.ucsd.edu>
>>
>> inode.h defines the non-volatile and volatile NOVA inode data structures.
>>
>> The non-volatile NOVA inode (nova_inode) is aligned to 128 bytes and contains
>> file/directory metadata information. The most important fields
>> are log_head and log_tail. log_head points to the start of
>> the log, and log_tail points to the end of the latest committed
>> log entry. NOVA make updates to the inode by appending
>> to the log tail and update the log_tail pointer atomically.
>>
>> The volatile NOVA inode (nova_inode_info) contains necessary
>> information to limit access to the non-volatile NOVA inode during runtime.
>> It has a radix tree to map file offset or filenames to the corresponding
>> log entries.
>>
>> Signed-off-by: Andiry Xu <jix024@cs.ucsd.edu>
>> ---
>>  fs/nova/inode.h | 187 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 187 insertions(+)
>>  create mode 100644 fs/nova/inode.h
>>
>> diff --git a/fs/nova/inode.h b/fs/nova/inode.h
>> new file mode 100644
>> index 0000000..f9187e3
>> --- /dev/null
>> +++ b/fs/nova/inode.h
>> @@ -0,0 +1,187 @@
>> +#ifndef __INODE_H
>> +#define __INODE_H
>> +
>> +struct nova_inode_info_header;
>> +struct nova_inode;
>> +
>> +#include "super.h"
>> +
>> +enum nova_new_inode_type {
>> +     TYPE_CREATE = 0,
>> +     TYPE_MKNOD,
>> +     TYPE_SYMLINK,
>> +     TYPE_MKDIR
>> +};
>> +
>> +
>> +/*
>> + * Structure of an inode in PMEM
>> + * Keep the inode size to within 120 bytes: We use the last eight bytes
>> + * as inode table tail pointer.
>
> I would've expected a
> BUILD_BUG_ON(NOVA_INODE_SIZE - sizeof(struct nova_inode) == 8);
> or something to enforce this.
>

Thanks, will do.

> (Or just equate inode number with byte offset?  I looked ahead at the
> directory entries and they seem to be 64-bit...)
>
> I guess I'm being lazy and doing a on-disk-format-only review. :)
>
>> + */
>> +struct nova_inode {
>> +
>> +     /* first 40 bytes */
>> +     u8      i_rsvd;          /* reserved. used to be checksum */
>
> Magic number?
>

OK.

>> +     u8      valid;           /* Is this inode valid? */
>> +     u8      deleted;         /* Is this inode deleted? */
>
> Would i_mode == 0 cover these?
>

Deleted flag comes from NOVA-Fortis code. I will check if i_mode can cover it.

>> +     u8      i_blk_type;      /* data block size this inode uses */
>
> I would've thought these would just be bits of i_flags?
>
> Also, if I have a 1G blocksize file and free space fragments to the
> point that there's > 1G of free space but none of it contiguous, I guess
> I can expect ENOSPC?
>

Yes, but 1G blocksize has not been tested.

>> +     __le32  i_flags;         /* Inode flags */
>> +     __le64  i_size;          /* Size of data in bytes */
>> +     __le32  i_ctime;         /* Inode modification time */
>> +     __le32  i_mtime;         /* Inode b-tree Modification time */
>> +     __le32  i_atime;         /* Access time */
>
> Same y2038 grumble from the previous patch.
>

Will fix.

>> +     __le16  i_mode;          /* File mode */
>> +     __le16  i_links_count;   /* Links count */
>> +
>> +     __le64  i_xattr;         /* Extended attribute block */
>> +
>> +     /* second 40 bytes */
>> +     __le32  i_uid;           /* Owner Uid */
>> +     __le32  i_gid;           /* Group Id */
>> +     __le32  i_generation;    /* File version (for NFS) */
>> +     __le32  i_create_time;   /* Create time */
>> +     __le64  nova_ino;        /* nova inode number */
>> +
>> +     __le64  log_head;        /* Log head pointer */
>> +     __le64  log_tail;        /* Log tail pointer */
>> +
>> +     /* last 40 bytes */
>> +     __le64  create_epoch_id; /* Transaction ID when create */
>> +     __le64  delete_epoch_id; /* Transaction ID when deleted */
>> +
>> +     struct {
>> +             __le32 rdev;     /* major/minor # */
>> +     } dev;                   /* device inode */
>> +
>> +     __le32  csum;            /* CRC32 checksum */
>> +     /* Leave 8 bytes for inode table tail pointer */
>> +} __attribute((__packed__));
>> +
>> +/*
>> + * NOVA-specific inode state kept in DRAM
>> + */
>> +struct nova_inode_info_header {
>> +     /* For files, tree holds a map from file offsets to
>> +      * write log entries.
>> +      *
>> +      * For directories, tree holds a map from a hash of the file name to
>> +      * dentry log entry.
>> +      */
>> +     struct radix_tree_root tree;
>> +     struct rw_semaphore i_sem;      /* Protect log and tree */
>> +     unsigned short i_mode;          /* Dir or file? */
>> +     unsigned int i_flags;
>> +     unsigned long log_pages;        /* Num of log pages */
>> +     unsigned long i_size;
>> +     unsigned long i_blocks;
>> +     unsigned long ino;
>> +     unsigned long pi_addr;
>> +     unsigned long valid_entries;    /* For thorough GC */
>> +     unsigned long num_entries;      /* For thorough GC */
>> +     u64 last_setattr;               /* Last setattr entry */
>> +     u64 last_link_change;           /* Last link change entry */
>> +     u64 last_dentry;                /* Last updated dentry */
>> +     u64 trans_id;                   /* Transaction ID */
>> +     u64 log_head;                   /* Log head pointer */
>> +     u64 log_tail;                   /* Log tail pointer */
>> +     u8  i_blk_type;
>> +};
>> +
>> +/*
>> + * DRAM state for inodes
>> + */
>> +struct nova_inode_info {
>> +     struct nova_inode_info_header header;
>> +     struct inode vfs_inode;
>> +};
>> +
>> +
>> +static inline struct nova_inode_info *NOVA_I(struct inode *inode)
>> +{
>> +     return container_of(inode, struct nova_inode_info, vfs_inode);
>> +}
>> +
>> +static inline void sih_lock(struct nova_inode_info_header *header)
>
> "sih"?  What happened to the "nova" prefix?
>

This structure is born before the name NOVA was decided.

Thanks,
Andiry

> --D
>
>> +{
>> +     down_write(&header->i_sem);
>> +}
>> +
>> +static inline void sih_unlock(struct nova_inode_info_header *header)
>> +{
>> +     up_write(&header->i_sem);
>> +}
>> +
>> +static inline void sih_lock_shared(struct nova_inode_info_header *header)
>> +{
>> +     down_read(&header->i_sem);
>> +}
>> +
>> +static inline void sih_unlock_shared(struct nova_inode_info_header *header)
>> +{
>> +     up_read(&header->i_sem);
>> +}
>> +
>> +static inline unsigned int
>> +nova_inode_blk_shift(struct nova_inode_info_header *sih)
>> +{
>> +     return blk_type_to_shift[sih->i_blk_type];
>> +}
>> +
>> +static inline uint32_t nova_inode_blk_size(struct nova_inode_info_header *sih)
>> +{
>> +     return blk_type_to_size[sih->i_blk_type];
>> +}
>> +
>> +static inline u64 nova_get_reserved_inode_addr(struct super_block *sb,
>> +     u64 inode_number)
>> +{
>> +     return (NOVA_DEF_BLOCK_SIZE_4K * RESERVE_INODE_START) +
>> +                     inode_number * NOVA_INODE_SIZE;
>> +}
>> +
>> +static inline struct nova_inode *nova_get_reserved_inode(struct super_block *sb,
>> +     u64 inode_number)
>> +{
>> +     struct nova_sb_info *sbi = NOVA_SB(sb);
>> +     u64 addr;
>> +
>> +     addr = nova_get_reserved_inode_addr(sb, inode_number);
>> +
>> +     return (struct nova_inode *)(sbi->virt_addr + addr);
>> +}
>> +
>> +static inline struct nova_inode *nova_get_inode_by_ino(struct super_block *sb,
>> +                                               u64 ino)
>> +{
>> +     if (ino == 0 || ino >= NOVA_NORMAL_INODE_START)
>> +             return NULL;
>> +
>> +     return nova_get_reserved_inode(sb, ino);
>> +}
>> +
>> +static inline struct nova_inode *nova_get_inode(struct super_block *sb,
>> +     struct inode *inode)
>> +{
>> +     struct nova_inode_info *si = NOVA_I(inode);
>> +     struct nova_inode_info_header *sih = &si->header;
>> +     struct nova_inode fake_pi;
>> +     void *addr;
>> +     int rc;
>> +
>> +     addr = nova_get_block(sb, sih->pi_addr);
>> +     rc = memcpy_mcsafe(&fake_pi, addr, sizeof(struct nova_inode));
>> +     if (rc)
>> +             return NULL;
>> +
>> +     return (struct nova_inode *)addr;
>> +}
>> +
>> +static inline int nova_persist_inode(struct nova_inode *pi)
>> +{
>> +     nova_flush_buffer(pi, sizeof(struct nova_inode), 1);
>> +     return 0;
>> +}
>> +
>> +#endif
>> --
>> 2.7.4
>>

  reply index

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-10 18:17 [RFC v2 00/83] NOVA: a new file system for persistent memory Andiry Xu
2018-03-10 18:17 ` [RFC v2 01/83] Introduction and documentation of NOVA filesystem Andiry Xu
2018-03-19 20:43   ` Randy Dunlap
2018-03-19 23:00     ` Andiry Xu
2018-04-22  8:05   ` Pavel Machek
2018-03-10 18:17 ` [RFC v2 02/83] Add nova_def.h Andiry Xu
2018-03-10 18:17 ` [RFC v2 03/83] Add super.h Andiry Xu
2018-03-15  4:54   ` Darrick J. Wong
2018-03-15  6:11     ` Andiry Xu
2018-03-15  9:05       ` Arnd Bergmann
2018-03-15 17:51         ` Andiry Xu
2018-03-15 20:04           ` Andreas Dilger
2018-03-15 20:38           ` Arnd Bergmann
2018-03-16  2:59             ` Theodore Y. Ts'o
2018-03-16  6:17               ` Andiry Xu
2018-03-16  6:30                 ` Darrick J. Wong
2018-03-16  9:19               ` Arnd Bergmann
2018-03-10 18:17 ` [RFC v2 04/83] NOVA inode definition Andiry Xu
2018-03-15  5:06   ` Darrick J. Wong
2018-03-15  6:16     ` Andiry Xu [this message]
2018-03-10 18:17 ` [RFC v2 05/83] Add NOVA filesystem definitions and useful helper routines Andiry Xu
2018-03-11 12:00   ` Nikolay Borisov
2018-03-11 19:22     ` Eric Biggers
2018-03-11 21:45       ` Andiry Xu
2018-03-19 19:39       ` Andiry Xu
2018-03-19 20:30         ` Eric Biggers
2018-03-19 21:59           ` Andiry Xu
2018-03-10 18:17 ` [RFC v2 06/83] Add inode get/read methods Andiry Xu
2018-04-23  6:12   ` Darrick J. Wong
2018-04-23 15:55     ` Andiry Xu
2018-03-10 18:17 ` [RFC v2 07/83] Initialize inode_info and rebuild inode information in nova_iget() Andiry Xu
2018-03-10 18:17 ` [RFC v2 08/83] NOVA superblock operations Andiry Xu
2018-03-10 18:17 ` [RFC v2 09/83] Add Kconfig and Makefile Andiry Xu
2018-03-11 12:15   ` Nikolay Borisov
2018-03-11 21:32     ` Andiry Xu
2018-03-10 18:17 ` [RFC v2 10/83] Add superblock integrity check Andiry Xu
2018-03-10 18:17 ` [RFC v2 11/83] Add timing and I/O statistics for performance analysis and profiling Andiry Xu
2018-03-10 18:17 ` [RFC v2 12/83] Add timing for mount and init Andiry Xu
2018-03-10 18:17 ` [RFC v2 13/83] Add remount_fs and show_options methods Andiry Xu
2018-03-10 18:17 ` [RFC v2 14/83] Add range node kmem cache Andiry Xu
2018-03-11 11:55   ` Nikolay Borisov
2018-03-11 21:31     ` Andiry Xu
2018-03-10 18:17 ` [RFC v2 15/83] Add free list data structure Andiry Xu
2018-03-10 18:17 ` [RFC v2 16/83] Initialize block map and free lists in nova_init() Andiry Xu
2018-03-11 12:12   ` Nikolay Borisov
2018-03-11 21:30     ` Andiry Xu
2018-03-10 18:17 ` [RFC v2 17/83] Add statfs support Andiry Xu
2018-03-10 18:17 ` [RFC v2 18/83] Add freelist statistics printing Andiry Xu
2018-03-10 18:18 ` [RFC v2 19/83] Add pmem block free routines Andiry Xu
2018-03-10 18:18 ` [RFC v2 20/83] Pmem block allocation routines Andiry Xu
2018-03-10 18:18 ` [RFC v2 21/83] Add log structure Andiry Xu
2018-03-10 18:18 ` [RFC v2 22/83] Inode log pages allocation and reclaimation Andiry Xu
2018-03-10 18:18 ` [RFC v2 23/83] Save allocator to pmem in put_super Andiry Xu
2018-03-10 18:18 ` [RFC v2 24/83] Initialize and allocate inode table Andiry Xu
2018-03-10 18:18 ` [RFC v2 25/83] Support get normal inode address and inode table extentsion Andiry Xu
2018-03-10 18:18 ` [RFC v2 26/83] Add inode_map to track inuse inodes Andiry Xu
2018-03-10 18:18 ` [RFC v2 27/83] Save the inode inuse list to pmem upon umount Andiry Xu
2018-03-10 18:18 ` [RFC v2 28/83] Add NOVA address space operations Andiry Xu
2018-03-10 18:18 ` [RFC v2 29/83] Add write_inode and dirty_inode routines Andiry Xu
2018-03-10 18:18 ` [RFC v2 30/83] New NOVA inode allocation Andiry Xu
2018-03-10 18:18 ` [RFC v2 31/83] Add new vfs " Andiry Xu
2018-03-10 18:18 ` [RFC v2 32/83] Add log entry definitions Andiry Xu
2018-03-10 18:18 ` [RFC v2 33/83] Inode log and entry printing for debug purpose Andiry Xu
2018-03-10 18:18 ` [RFC v2 34/83] Journal: NOVA light weight journal definitions Andiry Xu
2018-03-10 18:18 ` [RFC v2 35/83] Journal: Lite journal helper routines Andiry Xu
2018-03-10 18:18 ` [RFC v2 36/83] Journal: Lite journal recovery Andiry Xu
2018-03-10 18:18 ` [RFC v2 37/83] Journal: Lite journal create and commit Andiry Xu
2018-03-10 18:18 ` [RFC v2 38/83] Journal: NOVA lite journal initialization Andiry Xu
2018-03-10 18:18 ` [RFC v2 39/83] Log operation: dentry append Andiry Xu
2018-03-10 18:18 ` [RFC v2 40/83] Log operation: file write entry append Andiry Xu
2018-03-10 18:18 ` [RFC v2 41/83] Log operation: setattr " Andiry Xu
2018-03-10 18:18 ` [RFC v2 42/83] Log operation: link change append Andiry Xu
2018-03-10 18:18 ` [RFC v2 43/83] Log operation: in-place update log entry Andiry Xu
2018-03-10 18:18 ` [RFC v2 44/83] Log operation: invalidate log entries Andiry Xu
2018-03-10 18:18 ` [RFC v2 45/83] Log operation: file inode log lookup and assign Andiry Xu
2018-03-10 18:18 ` [RFC v2 46/83] Dir: Add Directory radix tree insert/remove methods Andiry Xu
2018-03-10 18:18 ` [RFC v2 47/83] Dir: Add initial dentries when initializing a directory inode log Andiry Xu
2018-03-10 18:18 ` [RFC v2 48/83] Dir: Readdir operation Andiry Xu
2018-03-10 18:18 ` [RFC v2 49/83] Dir: Append create/remove dentry Andiry Xu
2018-03-10 18:18 ` [RFC v2 50/83] Inode: Add nova_evict_inode Andiry Xu
2018-03-10 18:18 ` [RFC v2 51/83] Rebuild: directory inode Andiry Xu
2018-03-10 18:18 ` [RFC v2 52/83] Rebuild: file inode Andiry Xu
2018-03-10 18:18 ` [RFC v2 53/83] Namei: lookup Andiry Xu
2018-03-10 18:18 ` [RFC v2 54/83] Namei: create and mknod Andiry Xu
2018-03-10 18:18 ` [RFC v2 55/83] Namei: mkdir Andiry Xu
2018-03-10 18:18 ` [RFC v2 56/83] Namei: link and unlink Andiry Xu
2018-03-10 18:18 ` [RFC v2 57/83] Namei: rmdir Andiry Xu
2018-03-10 18:18 ` [RFC v2 58/83] Namei: rename Andiry Xu
2018-03-10 18:18 ` [RFC v2 59/83] Namei: setattr Andiry Xu
2018-03-10 18:18 ` [RFC v2 60/83] Add special inode operations Andiry Xu
2018-03-10 18:18 ` [RFC v2 61/83] Super: Add nova_export_ops Andiry Xu
2018-03-10 18:18 ` [RFC v2 62/83] File: getattr and file inode operations Andiry Xu
2018-03-10 18:18 ` [RFC v2 63/83] File operation: llseek Andiry Xu
2018-03-10 18:18 ` [RFC v2 64/83] File operation: open, fsync, flush Andiry Xu
2018-03-10 18:18 ` [RFC v2 65/83] File operation: read Andiry Xu
2018-03-10 18:18 ` [RFC v2 66/83] Super: Add file write item cache Andiry Xu
2018-03-10 18:18 ` [RFC v2 67/83] Dax: commit list of file write items to log Andiry Xu
2018-03-10 18:18 ` [RFC v2 68/83] File operation: copy-on-write write Andiry Xu
2018-03-10 18:18 ` [RFC v2 69/83] Super: Add module param inplace_data_updates Andiry Xu
2018-03-10 18:18 ` [RFC v2 70/83] File operation: Inplace write Andiry Xu
2018-03-10 18:18 ` [RFC v2 71/83] Symlink support Andiry Xu
2018-03-10 18:18 ` [RFC v2 72/83] File operation: fallocate Andiry Xu
2018-03-10 18:18 ` [RFC v2 73/83] Dax: Add iomap operations Andiry Xu
2018-03-10 18:18 ` [RFC v2 74/83] File operation: Mmap Andiry Xu
2018-03-10 18:18 ` [RFC v2 75/83] File operation: read/write iter Andiry Xu
2018-03-10 18:18 ` [RFC v2 76/83] Ioctl support Andiry Xu
2018-03-10 18:18 ` [RFC v2 77/83] GC: Fast garbage collection Andiry Xu
2018-03-10 18:18 ` [RFC v2 78/83] GC: Thorough " Andiry Xu
2018-03-10 18:19 ` [RFC v2 79/83] Normal recovery Andiry Xu
2018-03-10 18:19 ` [RFC v2 80/83] Failure recovery: bitmap operations Andiry Xu
2018-03-10 18:19 ` [RFC v2 81/83] Failure recovery: Inode pages recovery routines Andiry Xu
2018-03-10 18:19 ` [RFC v2 82/83] Failure recovery: Per-CPU recovery Andiry Xu
2018-03-10 18:19 ` [RFC v2 83/83] Sysfs support Andiry Xu
2018-03-15  0:33   ` Randy Dunlap
2018-03-15  6:07     ` Andiry Xu
2018-03-22 15:00   ` David Sterba
2018-03-23  0:31     ` Andiry Xu
2018-03-11  2:14 ` [RFC v2 00/83] NOVA: a new file system for persistent memory Theodore Y. Ts'o
2018-03-11  4:58   ` Andiry Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAD4SzjvseYcy0n7xcHtpQQM_+zHMPYXzn_UXBfk81dfV76CQug@mail.gmail.com \
    --to=jix024@eng.ucsd.edu \
    --cc=andiry.xu@gmail.com \
    --cc=andy.rudoff@intel.com \
    --cc=coughlan@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.com \
    --cc=jix024@cs.ucsd.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=miklos@szeredi.hu \
    --cc=swanson@cs.ucsd.edu \
    --cc=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git