Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Andiry Xu <jix024@eng.ucsd.edu>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvdimm@lists.01.org, dan.j.williams@intel.com,
	andy.rudoff@intel.com, coughlan@redhat.com, swanson@cs.ucsd.edu,
	david@fromorbit.com, jack@suse.com, swhiteho@redhat.com,
	miklos@szeredi.hu, andiry.xu@gmail.com,
	Andiry Xu <jix024@cs.ucsd.edu>
Subject: Re: [RFC v2 03/83] Add super.h.
Date: Wed, 14 Mar 2018 21:54:01 -0700
Message-ID: <20180315045401.GB4860@magnolia> (raw)
In-Reply-To: <1520705944-6723-4-git-send-email-jix024@eng.ucsd.edu>

On Sat, Mar 10, 2018 at 10:17:44AM -0800, Andiry Xu wrote:
> From: Andiry Xu <jix024@cs.ucsd.edu>
> 
> This header file defines NOVA persistent and volatile superblock
> data structures.
> 
> It also defines NOVA block layout:
> 
> Page 0: Superblock
> Page 1: Reserved inodes
> Page 2 - 15: Reserved
> Page 16 - 31: Inode table pointers
> Page 32 - 47: Journal address pointers
> Page 48 - 63: Reserved
> Pages n-2: Replicate reserved inodes
> Pages n-1: Replicate superblock
> 
> Other pages are for normal inodes, logs and data.
> 
> Signed-off-by: Andiry Xu <jix024@cs.ucsd.edu>
> ---
>  fs/nova/super.h | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 149 insertions(+)
>  create mode 100644 fs/nova/super.h
> 
> diff --git a/fs/nova/super.h b/fs/nova/super.h
> new file mode 100644
> index 0000000..cb53908
> --- /dev/null
> +++ b/fs/nova/super.h
> @@ -0,0 +1,149 @@
> +#ifndef __SUPER_H
> +#define __SUPER_H
> +/*
> + * Structure of the NOVA super block in PMEM
> + *
> + * The fields are partitioned into static and dynamic fields. The static fields
> + * never change after file system creation. This was primarily done because
> + * nova_get_block() returns NULL if the block offset is 0 (helps in catching
> + * bugs). So if we modify any field using journaling (for consistency), we
> + * will have to modify s_sum which is at offset 0. So journaling code fails.
> + * This (static+dynamic fields) is a temporary solution and can be avoided
> + * once the file system becomes stable and nova_get_block() returns correct
> + * pointers even for offset 0.
> + */
> +struct nova_super_block {
> +	/* static fields. they never change after file system creation.
> +	 * checksum only validates up to s_start_dynamic field below
> +	 */
> +	__le32		s_sum;			/* checksum of this sb */
> +	__le32		s_magic;		/* magic signature */
> +	__le32		s_padding32;
> +	__le32		s_blocksize;		/* blocksize in bytes */
> +	__le64		s_size;			/* total size of fs in bytes */
> +	char		s_volume_name[16];	/* volume name */
> +
> +	/* all the dynamic fields should go here */
> +	__le64		s_epoch_id;		/* Epoch ID */
> +
> +	/* s_mtime and s_wtime should be together and their order should not be
> +	 * changed. we use an 8 byte write to update both of them atomically
> +	 */
> +	__le32		s_mtime;		/* mount time */
> +	__le32		s_wtime;		/* write time */

Hmmm, 32-bit timestamps?  2038 isn't that far away...

> +} __attribute((__packed__));
> +
> +#define NOVA_SB_SIZE 512       /* must be power of two */
> +
> +/* ======================= Reserved blocks ========================= */
> +
> +/*
> + * Page 0 contains super blocks;
> + * Page 1 contains reserved inodes;
> + * Page 2 - 15 are reserved.
> + * Page 16 - 31 contain pointers to inode tables.
> + * Page 32 - 47 contain pointers to journal pages.
> + */
> +#define	HEAD_RESERVED_BLOCKS	64
> +#define	NUM_JOURNAL_PAGES	16
> +
> +#define	SUPER_BLOCK_START       0 // Superblock
> +#define	RESERVE_INODE_START	1 // Reserved inodes
> +#define	INODE_TABLE_START	16 // inode table pointers
> +#define	JOURNAL_START		32 // journal pointer table
> +
> +/* For replica super block and replica reserved inodes */
> +#define	TAIL_RESERVED_BLOCKS	2
> +
> +/* ======================= Reserved inodes ========================= */
> +
> +/* We have space for 31 reserved inodes */
> +#define NOVA_ROOT_INO		(1)
> +#define NOVA_INODETABLE_INO	(2)	/* Fake inode associated with inode
> +					 * stroage.  We need this because our
> +					 * allocator requires inode to be
> +					 * associated with each allocation.
> +					 * The data actually lives in linked
> +					 * lists in INODE_TABLE_START. */
> +#define NOVA_BLOCKNODE_INO	(3)     /* Storage for allocator state */
> +#define NOVA_LITEJOURNAL_INO	(4)     /* Storage for lightweight journals */
> +#define NOVA_INODELIST_INO	(5)     /* Storage for Inode free list */
> +
> +
> +/* Normal inode starts at 32 */
> +#define NOVA_NORMAL_INODE_START      (32)

I've been wondering this whole time, why not make the inode number the
byte offset into the pmem?  Then you don't have to lose the last 8 bytes
of each inode block to point to the next one.

--D

> +
> +
> +
> +/*
> + * NOVA super-block data in DRAM
> + */
> +struct nova_sb_info {
> +	struct super_block *sb;			/* VFS super block */
> +	struct nova_super_block *nova_sb;	/* DRAM copy of SB */
> +	struct block_device *s_bdev;
> +	struct dax_device *s_dax_dev;
> +
> +	/*
> +	 * base physical and virtual address of NOVA (which is also
> +	 * the pointer to the super block)
> +	 */
> +	phys_addr_t	phys_addr;
> +	void		*virt_addr;
> +	void		*replica_reserved_inodes_addr;
> +	void		*replica_sb_addr;
> +
> +	unsigned long	num_blocks;
> +
> +	/* Mount options */
> +	unsigned long	bpi;
> +	unsigned long	blocksize;
> +	unsigned long	initsize;
> +	unsigned long	s_mount_opt;
> +	kuid_t		uid;    /* Mount uid for root directory */
> +	kgid_t		gid;    /* Mount gid for root directory */
> +	umode_t		mode;   /* Mount mode for root directory */
> +	atomic_t	next_generation;
> +	/* inode tracking */
> +	unsigned long	s_inodes_used_count;
> +	unsigned long	head_reserved_blocks;
> +	unsigned long	tail_reserved_blocks;
> +
> +	struct mutex	s_lock;	/* protects the SB's buffer-head */
> +
> +	int cpus;
> +
> +	/* Current epoch. volatile guarantees visibility */
> +	volatile u64 s_epoch_id;
> +
> +	/* ZEROED page for cache page initialized */
> +	void *zeroed_page;
> +};
> +
> +static inline struct nova_sb_info *NOVA_SB(struct super_block *sb)
> +{
> +	return sb->s_fs_info;
> +}
> +
> +static inline struct nova_super_block
> +*nova_get_redund_super(struct super_block *sb)
> +{
> +	struct nova_sb_info *sbi = NOVA_SB(sb);
> +
> +	return (struct nova_super_block *)(sbi->replica_sb_addr);
> +}
> +
> +
> +/* If this is part of a read-modify-write of the super block,
> + * nova_memunlock_super() before calling!
> + */
> +static inline struct nova_super_block *nova_get_super(struct super_block *sb)
> +{
> +	struct nova_sb_info *sbi = NOVA_SB(sb);
> +
> +	return (struct nova_super_block *)sbi->virt_addr;
> +}
> +
> +extern void nova_error_mng(struct super_block *sb, const char *fmt, ...);
> +
> +#endif
> -- 
> 2.7.4
> 

  reply index

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-10 18:17 [RFC v2 00/83] NOVA: a new file system for persistent memory Andiry Xu
2018-03-10 18:17 ` [RFC v2 01/83] Introduction and documentation of NOVA filesystem Andiry Xu
2018-03-19 20:43   ` Randy Dunlap
2018-03-19 23:00     ` Andiry Xu
2018-04-22  8:05   ` Pavel Machek
2018-03-10 18:17 ` [RFC v2 02/83] Add nova_def.h Andiry Xu
2018-03-10 18:17 ` [RFC v2 03/83] Add super.h Andiry Xu
2018-03-15  4:54   ` Darrick J. Wong [this message]
2018-03-15  6:11     ` Andiry Xu
2018-03-15  9:05       ` Arnd Bergmann
2018-03-15 17:51         ` Andiry Xu
2018-03-15 20:04           ` Andreas Dilger
2018-03-15 20:38           ` Arnd Bergmann
2018-03-16  2:59             ` Theodore Y. Ts'o
2018-03-16  6:17               ` Andiry Xu
2018-03-16  6:30                 ` Darrick J. Wong
2018-03-16  9:19               ` Arnd Bergmann
2018-03-10 18:17 ` [RFC v2 04/83] NOVA inode definition Andiry Xu
2018-03-15  5:06   ` Darrick J. Wong
2018-03-15  6:16     ` Andiry Xu
2018-03-10 18:17 ` [RFC v2 05/83] Add NOVA filesystem definitions and useful helper routines Andiry Xu
2018-03-11 12:00   ` Nikolay Borisov
2018-03-11 19:22     ` Eric Biggers
2018-03-11 21:45       ` Andiry Xu
2018-03-19 19:39       ` Andiry Xu
2018-03-19 20:30         ` Eric Biggers
2018-03-19 21:59           ` Andiry Xu
2018-03-10 18:17 ` [RFC v2 06/83] Add inode get/read methods Andiry Xu
2018-04-23  6:12   ` Darrick J. Wong
2018-04-23 15:55     ` Andiry Xu
2018-03-10 18:17 ` [RFC v2 07/83] Initialize inode_info and rebuild inode information in nova_iget() Andiry Xu
2018-03-10 18:17 ` [RFC v2 08/83] NOVA superblock operations Andiry Xu
2018-03-10 18:17 ` [RFC v2 09/83] Add Kconfig and Makefile Andiry Xu
2018-03-11 12:15   ` Nikolay Borisov
2018-03-11 21:32     ` Andiry Xu
2018-03-10 18:17 ` [RFC v2 10/83] Add superblock integrity check Andiry Xu
2018-03-10 18:17 ` [RFC v2 11/83] Add timing and I/O statistics for performance analysis and profiling Andiry Xu
2018-03-10 18:17 ` [RFC v2 12/83] Add timing for mount and init Andiry Xu
2018-03-10 18:17 ` [RFC v2 13/83] Add remount_fs and show_options methods Andiry Xu
2018-03-10 18:17 ` [RFC v2 14/83] Add range node kmem cache Andiry Xu
2018-03-11 11:55   ` Nikolay Borisov
2018-03-11 21:31     ` Andiry Xu
2018-03-10 18:17 ` [RFC v2 15/83] Add free list data structure Andiry Xu
2018-03-10 18:17 ` [RFC v2 16/83] Initialize block map and free lists in nova_init() Andiry Xu
2018-03-11 12:12   ` Nikolay Borisov
2018-03-11 21:30     ` Andiry Xu
2018-03-10 18:17 ` [RFC v2 17/83] Add statfs support Andiry Xu
2018-03-10 18:17 ` [RFC v2 18/83] Add freelist statistics printing Andiry Xu
2018-03-10 18:18 ` [RFC v2 19/83] Add pmem block free routines Andiry Xu
2018-03-10 18:18 ` [RFC v2 20/83] Pmem block allocation routines Andiry Xu
2018-03-10 18:18 ` [RFC v2 21/83] Add log structure Andiry Xu
2018-03-10 18:18 ` [RFC v2 22/83] Inode log pages allocation and reclaimation Andiry Xu
2018-03-10 18:18 ` [RFC v2 23/83] Save allocator to pmem in put_super Andiry Xu
2018-03-10 18:18 ` [RFC v2 24/83] Initialize and allocate inode table Andiry Xu
2018-03-10 18:18 ` [RFC v2 25/83] Support get normal inode address and inode table extentsion Andiry Xu
2018-03-10 18:18 ` [RFC v2 26/83] Add inode_map to track inuse inodes Andiry Xu
2018-03-10 18:18 ` [RFC v2 27/83] Save the inode inuse list to pmem upon umount Andiry Xu
2018-03-10 18:18 ` [RFC v2 28/83] Add NOVA address space operations Andiry Xu
2018-03-10 18:18 ` [RFC v2 29/83] Add write_inode and dirty_inode routines Andiry Xu
2018-03-10 18:18 ` [RFC v2 30/83] New NOVA inode allocation Andiry Xu
2018-03-10 18:18 ` [RFC v2 31/83] Add new vfs " Andiry Xu
2018-03-10 18:18 ` [RFC v2 32/83] Add log entry definitions Andiry Xu
2018-03-10 18:18 ` [RFC v2 33/83] Inode log and entry printing for debug purpose Andiry Xu
2018-03-10 18:18 ` [RFC v2 34/83] Journal: NOVA light weight journal definitions Andiry Xu
2018-03-10 18:18 ` [RFC v2 35/83] Journal: Lite journal helper routines Andiry Xu
2018-03-10 18:18 ` [RFC v2 36/83] Journal: Lite journal recovery Andiry Xu
2018-03-10 18:18 ` [RFC v2 37/83] Journal: Lite journal create and commit Andiry Xu
2018-03-10 18:18 ` [RFC v2 38/83] Journal: NOVA lite journal initialization Andiry Xu
2018-03-10 18:18 ` [RFC v2 39/83] Log operation: dentry append Andiry Xu
2018-03-10 18:18 ` [RFC v2 40/83] Log operation: file write entry append Andiry Xu
2018-03-10 18:18 ` [RFC v2 41/83] Log operation: setattr " Andiry Xu
2018-03-10 18:18 ` [RFC v2 42/83] Log operation: link change append Andiry Xu
2018-03-10 18:18 ` [RFC v2 43/83] Log operation: in-place update log entry Andiry Xu
2018-03-10 18:18 ` [RFC v2 44/83] Log operation: invalidate log entries Andiry Xu
2018-03-10 18:18 ` [RFC v2 45/83] Log operation: file inode log lookup and assign Andiry Xu
2018-03-10 18:18 ` [RFC v2 46/83] Dir: Add Directory radix tree insert/remove methods Andiry Xu
2018-03-10 18:18 ` [RFC v2 47/83] Dir: Add initial dentries when initializing a directory inode log Andiry Xu
2018-03-10 18:18 ` [RFC v2 48/83] Dir: Readdir operation Andiry Xu
2018-03-10 18:18 ` [RFC v2 49/83] Dir: Append create/remove dentry Andiry Xu
2018-03-10 18:18 ` [RFC v2 50/83] Inode: Add nova_evict_inode Andiry Xu
2018-03-10 18:18 ` [RFC v2 51/83] Rebuild: directory inode Andiry Xu
2018-03-10 18:18 ` [RFC v2 52/83] Rebuild: file inode Andiry Xu
2018-03-10 18:18 ` [RFC v2 53/83] Namei: lookup Andiry Xu
2018-03-10 18:18 ` [RFC v2 54/83] Namei: create and mknod Andiry Xu
2018-03-10 18:18 ` [RFC v2 55/83] Namei: mkdir Andiry Xu
2018-03-10 18:18 ` [RFC v2 56/83] Namei: link and unlink Andiry Xu
2018-03-10 18:18 ` [RFC v2 57/83] Namei: rmdir Andiry Xu
2018-03-10 18:18 ` [RFC v2 58/83] Namei: rename Andiry Xu
2018-03-10 18:18 ` [RFC v2 59/83] Namei: setattr Andiry Xu
2018-03-10 18:18 ` [RFC v2 60/83] Add special inode operations Andiry Xu
2018-03-10 18:18 ` [RFC v2 61/83] Super: Add nova_export_ops Andiry Xu
2018-03-10 18:18 ` [RFC v2 62/83] File: getattr and file inode operations Andiry Xu
2018-03-10 18:18 ` [RFC v2 63/83] File operation: llseek Andiry Xu
2018-03-10 18:18 ` [RFC v2 64/83] File operation: open, fsync, flush Andiry Xu
2018-03-10 18:18 ` [RFC v2 65/83] File operation: read Andiry Xu
2018-03-10 18:18 ` [RFC v2 66/83] Super: Add file write item cache Andiry Xu
2018-03-10 18:18 ` [RFC v2 67/83] Dax: commit list of file write items to log Andiry Xu
2018-03-10 18:18 ` [RFC v2 68/83] File operation: copy-on-write write Andiry Xu
2018-03-10 18:18 ` [RFC v2 69/83] Super: Add module param inplace_data_updates Andiry Xu
2018-03-10 18:18 ` [RFC v2 70/83] File operation: Inplace write Andiry Xu
2018-03-10 18:18 ` [RFC v2 71/83] Symlink support Andiry Xu
2018-03-10 18:18 ` [RFC v2 72/83] File operation: fallocate Andiry Xu
2018-03-10 18:18 ` [RFC v2 73/83] Dax: Add iomap operations Andiry Xu
2018-03-10 18:18 ` [RFC v2 74/83] File operation: Mmap Andiry Xu
2018-03-10 18:18 ` [RFC v2 75/83] File operation: read/write iter Andiry Xu
2018-03-10 18:18 ` [RFC v2 76/83] Ioctl support Andiry Xu
2018-03-10 18:18 ` [RFC v2 77/83] GC: Fast garbage collection Andiry Xu
2018-03-10 18:18 ` [RFC v2 78/83] GC: Thorough " Andiry Xu
2018-03-10 18:19 ` [RFC v2 79/83] Normal recovery Andiry Xu
2018-03-10 18:19 ` [RFC v2 80/83] Failure recovery: bitmap operations Andiry Xu
2018-03-10 18:19 ` [RFC v2 81/83] Failure recovery: Inode pages recovery routines Andiry Xu
2018-03-10 18:19 ` [RFC v2 82/83] Failure recovery: Per-CPU recovery Andiry Xu
2018-03-10 18:19 ` [RFC v2 83/83] Sysfs support Andiry Xu
2018-03-15  0:33   ` Randy Dunlap
2018-03-15  6:07     ` Andiry Xu
2018-03-22 15:00   ` David Sterba
2018-03-23  0:31     ` Andiry Xu
2018-03-11  2:14 ` [RFC v2 00/83] NOVA: a new file system for persistent memory Theodore Y. Ts'o
2018-03-11  4:58   ` Andiry Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180315045401.GB4860@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=andiry.xu@gmail.com \
    --cc=andy.rudoff@intel.com \
    --cc=coughlan@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.com \
    --cc=jix024@cs.ucsd.edu \
    --cc=jix024@eng.ucsd.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=miklos@szeredi.hu \
    --cc=swanson@cs.ucsd.edu \
    --cc=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git