linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
@ 2004-10-19  9:37 Anton Altaparmakov
  2004-10-19  9:38 ` [PATCH 1/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

Hi Linus, please do a

	bk pull bk://linux-ntfs.bkbits.net/ntfs-2.6

This is a rather big NTFS update that has been in the working whilst 
waiting for 2.6.9 to be released.  Large chunks of that have been in 
various -mm kernels allready.  It fixes some race conditions and other 
bugs as well as rewriting the inode writeout code and adding inode 
allocation/deallocation functionality (but not file creation/deletion!).

Please apply.  Thanks!

For the benefit of non-BK users and to make code review easier, I am
sending each ChangeSet in a separate email as a diff -u style patch.

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

This will update the following files:

 Documentation/filesystems/ntfs.txt |  177 ++
 fs/ntfs/ChangeLog                  |  130 +
 fs/ntfs/Makefile                   |    8 
 fs/ntfs/aops.c                     |  433 ++++--
 fs/ntfs/aops.h                     |  108 +
 fs/ntfs/attrib.c                   | 1180 ++--------------
 fs/ntfs/attrib.h                   |   26 
 fs/ntfs/bitmap.c                   |    1 
 fs/ntfs/collate.c                  |    3 
 fs/ntfs/compress.c                 |    9 
 fs/ntfs/debug.h                    |    6 
 fs/ntfs/dir.c                      |   14 
 fs/ntfs/dir.h                      |    2 
 fs/ntfs/file.c                     |    5 
 fs/ntfs/index.c                    |   59 
 fs/ntfs/index.h                    |   10 
 fs/ntfs/inode.c                    |  152 +-
 fs/ntfs/inode.h                    |   19 
 fs/ntfs/layout.h                   |   85 +
 fs/ntfs/lcnalloc.c                 |    9 
 fs/ntfs/lcnalloc.h                 |   29 
 fs/ntfs/logfile.c                  |   67 
 fs/ntfs/malloc.h                   |    1 
 fs/ntfs/mft.c                      | 2637 +++++++++++++++++++++++++++++--------
 fs/ntfs/mft.h                      |   17 
 fs/ntfs/namei.c                    |    7 
 fs/ntfs/ntfs.h                     |   78 -
 fs/ntfs/quota.c                    |    3 
 fs/ntfs/runlist.c                  | 1512 ++++++++++++++++++++-
 fs/ntfs/runlist.h                  |   95 +
 fs/ntfs/super.c                    |   26 
 fs/ntfs/types.h                    |   28 
 fs/ntfs/unistr.c                   |    2 
 fs/ntfs/upcase.c                   |    1 
 fs/ntfs/volume.h                   |    4 
 35 files changed, 4917 insertions(+), 2026 deletions(-)

through these ChangeSets:

<aia21@cantab.net> (04/09/29 1.1995.1.2)
   NTFS: Implement extent mft record deallocation.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/09/30 1.2000.1.1)
   NTFS: Splitt runlist related functions off from attrib.[hc] to runlist.[hc].
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/09/30 1.2000.1.2)
   NTFS: Add vol->mft_data_pos and initialize it at mount time.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/09/30 1.2000.1.3)
   NTFS: Rename init_runlist() to ntfs_init_runlist(), ntfs_vcn_to_lcn() to
         ntfs_rl_vcn_to_lcn(), decompress_mapping_pairs() to
         ntfs_mapping_pairs_decompress() and adapt all callers.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/09/30 1.2004)
   NTFS: Forgot to lock the mft bitmap when clearing the bit in
         ntfs_extent_mft_record_free().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/09/30 1.2005)
   NTFS: Add fs/ntfs/runlist.[hc]::ntfs_get_nr_significant_bytes(),
         ntfs_get_size_for_mapping_pairs(), ntfs_write_significant_bytes(),
         and ntfs_mapping_pairs_build(), adapted from libntfs.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/09/30 1.2005.1.1)
   NTFS: Rename ntfs_merge_runlists() to ntfs_runlists_merge().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/09/30 1.2005.1.2)
   NTFS: - Add fs/ntfs/lcnalloc.h::ntfs_cluster_free_from_rl() which is a static
           inline wrapper for ntfs_cluster_free_from_rl_nolock() which takes the
           cluster bitmap lock for the duration of the call.
         - Make fs/ntfs/lcnalloc.c::ntfs_cluster_free_from_rl_nolock() not
           static and add a declaration for it to lcnalloc.h.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/09/30 1.2005.1.3)
   NTFS: Add fs/ntfs/attrib.[hc]::ntfs_attr_record_resize().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/01 1.2011)
   NTFS: Implement the equivalent of memset() for an ntfs attribute in
         fs/ntfs/attrib.[hc]::ntfs_attr_set() and switch
         fs/ntfs/logfile.c::ntfs_empty_logfile() to using it.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/01 1.2011.1.1)
   NTFS: Remove unnecessary casts from LCN_* constants.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/02 1.2011.1.2)
   NTFS: Implement fs/ntfs/runlist.c::ntfs_rl_truncate_nolock().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/03 1.2017)
   NTFS: Add MFT_RECORD_OLD as a copy of MFT_RECORD in fs/ntfs/layout.h
         and change MFT_RECORD to contain the NTFS 3.1+ specific fields.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/03 1.2018)
   NTFS: Add some debugging checks to fs/ntfs/inode.c::ntfs_truncate() and fix
         a typo in fs/ntfs/layout.h.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/05 1.2022.1.1)
   NTFS: Add a helper function fs/ntfs/aops.c::mark_ntfs_record_dirty() which
         marks all buffers belonging to an ntfs record dirty, followed by
         marking the page the ntfs record is in dirty and also marking the vfs
         inode containing the ntfs record dirty (I_DIRTY_PAGES).
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/05 1.2022.1.2)
   NTFS: Switch fs/ntfs/index.h::ntfs_index_entry_mark_dirty() to using the
         new helper fs/ntfs/aops.c::mark_ntfs_record_dirty() and remove the no
         longer needed fs/ntfs/index.[hc]::__ntfs_index_entry_mark_dirty().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/07 1.2029)
   NTFS: - Move ntfs_{un,}map_page() from ntfs.h to aops.h and fix resulting
           include errors.
         - Move typedefs for runlist_element and runlist from ntfs.h to
           runlist.h and fix resulting include errors.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/07 1.2030)
   NTFS: Remove unused {__,}format_mft_record() from fs/ntfs/mft.c.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/07 1.2031)
   NTFS: - Modify fs/ntfs/mft.c::__mark_mft_record_dirty() to use the helper
           mark_ntfs_record_dirty() which also changes the behaviour in that we
           now set the buffers belonging to the mft record dirty as well as the
           page itself.
         - Update fs/ntfs/mft.c::write_mft_record_nolock() and sync_mft_mirror()
           to cope with the fact that there now are dirty buffers in mft pages.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/10 1.2032.1.1)
   NTFS: Update fs/ntfs/inode.c::ntfs_write_inode() to also use the helper
         mark_ntfs_record_dirty() and thus to set the buffers belonging to the
         mft record dirty as well as the page itself.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/10 1.2032.1.2)
   NTFS: Fix warnings on x86-64.  (Randy Dunlap with slight modification from me)
   
   Fix printk arg type warnings on x86-64 (and OK on x86-32) (gcc 3.3.3):
   fs/ntfs/dir.c:1272: warning: long long unsigned int format, long unsigned int arg (arg 6)
   fs/ntfs/dir.c:1388: warning: long long unsigned int format, long unsigned int arg (arg 5
   
   Signed-off-by: Randy Dunlap <rddunlap@osdl.org>
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/11 1.2041)
   NTFS: Add fs/ntfs/mft.c::try_map_mft_record() which fails with -EALREADY if
         the mft record is already locked and otherwise behaves the same way
         as fs/ntfs/mft.c::map_mft_record().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/11 1.2041.1.1)
   NTFS: Modify fs/ntfs/mft.c::write_mft_record_nolock() so that it only
         writes the mft record if the buffers belonging to it are dirty.
         Otherwise we assume that it was written out by other means already.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/12 1.2041.1.2)
   NTFS: Attempting to write outside initialized size is _not_ a bug so remove
         the bug check from fs/ntfs/aops.c::ntfs_write_mst_block().  It is in
         fact required to write outside initialized size when preparing to
         extend the initialized size.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/12 1.2041.1.3)
   NTFS: Map the page instead of using page_address() before writing to it in
         fs/ntfs/aops.c::ntfs_mft_writepage().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/12 1.2041.1.4)
   NTFS: Provide exclusion between opening an inode / mapping an mft record
         and accessing the mft record in fs/ntfs/mft.c::ntfs_mft_writepage()
         by setting the page not uptodate throughout ntfs_mft_writepage().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/14 1.2046)
   NTFS: Big cleanup of mft record writing code.
   
   - Clear the page uptodate flag in fs/ntfs/aops.c::ntfs_write_mst_block()
     to ensure noone can see the page whilst the mst fixups are applied.
   - Add the helper fs/ntfs/mft.c::ntfs_may_write_mft_record() which
     checks if an mft record may be written out safely obtaining any
     necessary locks in the process.  This is used by
     fs/ntfs/aops.c::ntfs_write_mst_block().
   - Modify fs/ntfs/aops.c::ntfs_write_mst_block() to also work for
     writing mft records and improve its error handling in the process.
     Now if any of the records in the page fail to be written out, all
     other records will be written out instead of aborting completely.
   - Remove ntfs_mft_aops and update all users to use ntfs_mst_aops.
   - Modify fs/ntfs/inode.c::ntfs_read_locked_inode() to set the
     ntfs_mst_aops for all inodes which are NInoMstProtected() and
     ntfs_aops for all other inodes.
   - Rename fs/ntfs/mft.c::sync_mft_mirror{,_umount}() to
     ntfs_sync_mft_mirror{,_umount}() and change their parameters so they
     no longer require an ntfs inode to be present.  Update all callers.
   - Cleanup the error handling in fs/ntfs/mft.c::ntfs_sync_mft_mirror().
   - Clear the page uptodate flag in fs/ntfs/mft.c::ntfs_sync_mft_mirror()
     to ensure noone can see the page whilst the mst fixups are applied.
   - Remove the no longer needed fs/ntfs/mft.c::ntfs_mft_writepage() and
     fs/ntfs/mft.c::try_map_mft_record().
   - Fix callers of fs/ntfs/aops.c::mark_ntfs_record_dirty() to call it
     with the ntfs inode which contains the page rather than the ntfs
     inode the mft record of which is in the page.
   
   Ooops.  Yes, I know, I should have split this up into smaller changes...
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/14 1.2047)
   NTFS: - Fix two race conditions in fs/ntfs/inode.c::ntfs_put_inode().
   
   - Fix race condition in fs/ntfs/inode.c::ntfs_put_inode() by moving the
     index inode bitmap inode release code from there to
     fs/ntfs/inode.c::ntfs_clear_big_inode().  (Thanks to Christoph
     Hellwig for spotting this.)
   - Fix race condition in fs/ntfs/inode.c::ntfs_put_inode() by taking the
     inode semaphore around the code thst sets ni->itype.index.bmp_ino to
     NULL and reorganize the code to optimize it a bit.  (Thanks to
     Christoph Hellwig for spotting this.)
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/15 1.2047.1.1)
   NTFS: Modify fs/ntfs/aops.c::mark_ntfs_record_dirty() to no longer take the
         ntfs inode as a parameter as this is confusing and misleading and the
         ntfs inode is available via NTFS_I(page->mapping->host).
         Adapt all callers to this change.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/15 1.2047.1.2)
   NTFS: Modify fs/ntfs/mft.c::write_mft_record_nolock() and
         fs/ntfs/aops.c::ntfs_write_mst_block() to only check the dirty state
         of the first buffer in a record and to take this as the ntfs record
         dirty state.  We cannot look at the dirty state for subsequent
         buffers because we might be racing with
         fs/ntfs/aops.c::mark_ntfs_record_dirty().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/15 1.2051)
   NTFS: Move the static inline ntfs_init_big_inode() from fs/ntfs/inode.c to
         inode.h and make fs/ntfs/inode.c::__ntfs_init_inode() non-static and
         add a declaration for it to inode.h.  Fix some compilation issues
         that resulted due to #includes and header file interdependencies.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/15 1.2052)
   NTFS: Simplify setup of i_mode in fs/ntfs/inode.c::ntfs_read_locked_inode().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/15 1.2053)
   NTFS: Add helpers fs/ntfs/layout.h::MK_MREF() and MK_LE_MREF().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/15 1.2054)
   NTFS: Modify fs/ntfs/mft.c::map_extent_mft_record() to only verify the mft
         record sequence number if it is specified (i.e. not zero).
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/16 1.2055)
   NTFS: Add fs/ntfs/mft.[hc]::ntfs_mft_record_alloc() and various helper
         functions used by it.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/17 1.2059)
   NTFS: 2.1.21 release
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

<aia21@cantab.net> (04/10/18 1.2061)
   NTFS: Update Documentation/filesystems/ntfs.txt with instructions on how to
         use the Device-Mapper driver with NTFS ftdisk/LDM raid.  This removes
         the linear raid problem with the Software RAID / MD driver when one
         or more of the devices has an odd number of sectors.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:37 [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes Anton Altaparmakov
@ 2004-10-19  9:38 ` Anton Altaparmakov
  2004-10-19  9:39   ` [PATCH 2/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 1/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/09/29 1.1995.1.2)
   NTFS: Implement extent mft record deallocation.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:13:07 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:13:07 +01:00
@@ -21,6 +21,11 @@
 	- Enable the code for setting the NT4 compatibility flag when we start
 	  making NTFS 1.2 specific modifications.
 
+2.1.20-WIP
+
+	- Implement extent mft record deallocation
+	  fs/ntfs/mft.c::ntfs_extent_mft_record_free().
+
 2.1.19 - Many cleanups, improvements, and a minor bug fix.
 
 	- Update ->setattr (fs/ntfs/inode.c::ntfs_setattr()) to refuse to
diff -Nru a/fs/ntfs/Makefile b/fs/ntfs/Makefile
--- a/fs/ntfs/Makefile	2004-10-19 10:13:07 +01:00
+++ b/fs/ntfs/Makefile	2004-10-19 10:13:07 +01:00
@@ -6,7 +6,7 @@
 	     index.o inode.o mft.o mst.o namei.o super.o sysctl.o unistr.o \
 	     upcase.o
 
-EXTRA_CFLAGS = -DNTFS_VERSION=\"2.1.19\"
+EXTRA_CFLAGS = -DNTFS_VERSION=\"2.1.20-WIP\"
 
 ifeq ($(CONFIG_NTFS_DEBUG),y)
 EXTRA_CFLAGS += -DDEBUG
diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:13:07 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:13:07 +01:00
@@ -23,6 +23,7 @@
 #include <linux/swap.h>
 
 #include "ntfs.h"
+#include "bitmap.h"
 
 /**
  * __format_mft_record - initialize an empty mft record
@@ -1095,4 +1096,159 @@
 	return 0;
 }
 
+static const char *es = "  Leaving inconsistent metadata.  Unmount and run "
+		"chkdsk.";
+
+/**
+ * ntfs_extent_mft_record_free - free an extent mft record on an ntfs volume
+ * @ni:		ntfs inode of the mapped extent mft record to free
+ * @m:		mapped extent mft record of the ntfs inode @ni
+ *
+ * Free the mapped extent mft record @m of the extent ntfs inode @ni.
+ *
+ * Note that this function unmaps the mft record and closes and destroys @ni
+ * internally and hence you cannot use either @ni nor @m any more after this
+ * function returns success.
+ *
+ * On success return 0 and on error return -errno.  @ni and @m are still valid
+ * in this case and have not been freed.
+ *
+ * For some errors an error message is displayed and the success code 0 is
+ * returned and the volume is then left dirty on umount.  This makes sense in
+ * case we could not rollback the changes that were already done since the
+ * caller no longer wants to reference this mft record so it does not matter to
+ * the caller if something is wrong with it as long as it is properly detached
+ * from the base inode.
+ */
+int ntfs_extent_mft_record_free(ntfs_inode *ni, MFT_RECORD *m)
+{
+	unsigned long mft_no = ni->mft_no;
+	ntfs_volume *vol = ni->vol;
+	ntfs_inode *base_ni;
+	ntfs_inode **extent_nis;
+	int i, err;
+	le16 old_seq_no;
+	u16 seq_no;
+	
+	BUG_ON(NInoAttr(ni));
+	BUG_ON(ni->nr_extents != -1);
+
+	down(&ni->extent_lock);
+	base_ni = ni->ext.base_ntfs_ino;
+	up(&ni->extent_lock);
+
+	BUG_ON(base_ni->nr_extents <= 0);
+
+	ntfs_debug("Entering for extent inode 0x%lx, base inode 0x%lx.\n",
+			mft_no, base_ni->mft_no);
+
+	down(&base_ni->extent_lock);
+
+	/* Make sure we are holding the only reference to the extent inode. */
+	if (atomic_read(&ni->count) > 2) {
+		ntfs_error(vol->sb, "Tried to free busy extent inode 0x%lx, "
+				"not freeing.", base_ni->mft_no);
+		up(&base_ni->extent_lock);
+		return -EBUSY;
+	}
+
+	/* Dissociate the ntfs inode from the base inode. */
+	extent_nis = base_ni->ext.extent_ntfs_inos;
+	err = -ENOENT;
+	for (i = 0; i < base_ni->nr_extents; i++) {
+		if (ni != extent_nis[i])
+			continue;
+		extent_nis += i;
+		base_ni->nr_extents--;
+		memmove(extent_nis, extent_nis + 1, (base_ni->nr_extents - i) *
+				sizeof(ntfs_inode*));
+		err = 0;
+		break;
+	}
+
+	up(&base_ni->extent_lock);
+
+	if (unlikely(err)) {
+		ntfs_error(vol->sb, "Extent inode 0x%lx is not attached to "
+				"its base inode 0x%lx.", mft_no,
+				base_ni->mft_no);
+		BUG();
+	}
+
+	/*
+	 * The extent inode is no longer attached to the base inode so no one
+	 * can get a reference to it any more.
+	 */
+
+	/* Mark the mft record as not in use. */
+	m->flags &= const_cpu_to_le16(~const_le16_to_cpu(MFT_RECORD_IN_USE));
+
+	/* Increment the sequence number, skipping zero, if it is not zero. */
+	old_seq_no = m->sequence_number;
+	seq_no = le16_to_cpu(old_seq_no);
+	if (seq_no == 0xffff)
+		seq_no = 1;
+	else if (seq_no)
+		seq_no++;
+	m->sequence_number = cpu_to_le16(seq_no);
+
+	/*
+	 * Set the ntfs inode dirty and write it out.  We do not need to worry
+	 * about the base inode here since whatever caused the extent mft
+	 * record to be freed is guaranteed to do it already.
+	 */
+	NInoSetDirty(ni);
+	err = write_mft_record(ni, m, 0);
+	if (unlikely(err)) {
+		ntfs_error(vol->sb, "Failed to write mft record 0x%lx, not "
+				"freeing.", mft_no);
+		goto rollback;
+	}
+rollback_error:
+	/* Unmap and throw away the now freed extent inode. */
+	unmap_extent_mft_record(ni);
+	ntfs_clear_extent_inode(ni);
+
+	/* Clear the bit in the $MFT/$BITMAP corresponding to this record. */
+	err = ntfs_bitmap_clear_bit(vol->mftbmp_ino, mft_no);
+	if (unlikely(err)) {
+		/*
+		 * The extent inode is gone but we failed to deallocate it in
+		 * the mft bitmap.  Just emit a warning and leave the volume
+		 * dirty on umount.
+		 */
+		ntfs_error(vol->sb, "Failed to clear bit in mft bitmap.%s", es);
+		NVolSetErrors(vol);
+	}
+	return 0;
+rollback:
+	/* Rollback what we did... */
+	down(&base_ni->extent_lock);
+	extent_nis = base_ni->ext.extent_ntfs_inos;
+	if (!(base_ni->nr_extents & 3)) {
+		int new_size = (base_ni->nr_extents + 4) * sizeof(ntfs_inode*);
+
+		extent_nis = (ntfs_inode**)kmalloc(new_size, GFP_NOFS);
+		if (unlikely(!extent_nis)) {
+			ntfs_error(vol->sb, "Failed to allocate internal "
+					"buffer during rollback.%s", es);
+			up(&base_ni->extent_lock);
+			NVolSetErrors(vol);
+			goto rollback_error;
+		}
+		if (base_ni->nr_extents) {
+			BUG_ON(!base_ni->ext.extent_ntfs_inos);
+			memcpy(extent_nis, base_ni->ext.extent_ntfs_inos,
+					new_size - 4 * sizeof(ntfs_inode*));
+			kfree(base_ni->ext.extent_ntfs_inos);
+		}
+		base_ni->ext.extent_ntfs_inos = extent_nis;
+	}
+	m->flags |= MFT_RECORD_IN_USE;
+	m->sequence_number = old_seq_no;
+	extent_nis[base_ni->nr_extents++] = ni;
+	up(&base_ni->extent_lock);
+	mark_mft_record_dirty(ni);
+	return err;
+}
 #endif /* NTFS_RW */
diff -Nru a/fs/ntfs/mft.h b/fs/ntfs/mft.h
--- a/fs/ntfs/mft.h	2004-10-19 10:13:07 +01:00
+++ b/fs/ntfs/mft.h	2004-10-19 10:13:07 +01:00
@@ -111,6 +111,8 @@
 	return err;
 }
 
+extern int ntfs_extent_mft_record_free(ntfs_inode *ni, MFT_RECORD *m);
+
 #endif /* NTFS_RW */
 
 #endif /* _LINUX_NTFS_MFT_H */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 2/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:38 ` [PATCH 1/37] " Anton Altaparmakov
@ 2004-10-19  9:39   ` Anton Altaparmakov
  2004-10-19  9:39     ` [PATCH 3/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 2/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/09/30 1.2000.1.1)
   NTFS: Splitt runlist related functions off from attrib.[hc] to runlist.[hc].
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:13:11 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:13:11 +01:00
@@ -25,6 +25,7 @@
 
 	- Implement extent mft record deallocation
 	  fs/ntfs/mft.c::ntfs_extent_mft_record_free().
+	- Splitt runlist related functions off from attrib.[hc] to runlist.[hc].
 
 2.1.19 - Many cleanups, improvements, and a minor bug fix.
 
diff -Nru a/fs/ntfs/Makefile b/fs/ntfs/Makefile
--- a/fs/ntfs/Makefile	2004-10-19 10:13:11 +01:00
+++ b/fs/ntfs/Makefile	2004-10-19 10:13:11 +01:00
@@ -3,8 +3,8 @@
 obj-$(CONFIG_NTFS_FS) += ntfs.o
 
 ntfs-objs := aops.o attrib.o collate.o compress.o debug.o dir.o file.o \
-	     index.o inode.o mft.o mst.o namei.o super.o sysctl.o unistr.o \
-	     upcase.o
+	     index.o inode.o mft.o mst.o namei.o runlist.o super.o sysctl.o \
+	     unistr.o upcase.o
 
 EXTRA_CFLAGS = -DNTFS_VERSION=\"2.1.20-WIP\"
 
diff -Nru a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c
--- a/fs/ntfs/attrib.c	2004-10-19 10:13:11 +01:00
+++ b/fs/ntfs/attrib.c	2004-10-19 10:13:11 +01:00
@@ -1,5 +1,5 @@
 /**
- * attrib.c - NTFS attribute operations. Part of the Linux-NTFS project.
+ * attrib.c - NTFS attribute operations.  Part of the Linux-NTFS project.
  *
  * Copyright (c) 2001-2004 Anton Altaparmakov
  * Copyright (c) 2002 Richard Russon
@@ -22,914 +22,7 @@
 
 #include <linux/buffer_head.h>
 #include "ntfs.h"
-#include "dir.h"
-
-/* Temporary helper functions -- might become macros */
-
-/**
- * ntfs_rl_mm - runlist memmove
- *
- * It is up to the caller to serialize access to the runlist @base.
- */
-static inline void ntfs_rl_mm(runlist_element *base, int dst, int src,
-		int size)
-{
-	if (likely((dst != src) && (size > 0)))
-		memmove(base + dst, base + src, size * sizeof (*base));
-}
-
-/**
- * ntfs_rl_mc - runlist memory copy
- *
- * It is up to the caller to serialize access to the runlists @dstbase and
- * @srcbase.
- */
-static inline void ntfs_rl_mc(runlist_element *dstbase, int dst,
-		runlist_element *srcbase, int src, int size)
-{
-	if (likely(size > 0))
-		memcpy(dstbase + dst, srcbase + src, size * sizeof(*dstbase));
-}
-
-/**
- * ntfs_rl_realloc - Reallocate memory for runlists
- * @rl:		original runlist
- * @old_size:	number of runlist elements in the original runlist @rl
- * @new_size:	number of runlist elements we need space for
- *
- * As the runlists grow, more memory will be required.  To prevent the
- * kernel having to allocate and reallocate large numbers of small bits of
- * memory, this function returns and entire page of memory.
- *
- * It is up to the caller to serialize access to the runlist @rl.
- *
- * N.B.  If the new allocation doesn't require a different number of pages in
- *       memory, the function will return the original pointer.
- *
- * On success, return a pointer to the newly allocated, or recycled, memory.
- * On error, return -errno. The following error codes are defined:
- *	-ENOMEM	- Not enough memory to allocate runlist array.
- *	-EINVAL	- Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_realloc(runlist_element *rl,
-		int old_size, int new_size)
-{
-	runlist_element *new_rl;
-
-	old_size = PAGE_ALIGN(old_size * sizeof(*rl));
-	new_size = PAGE_ALIGN(new_size * sizeof(*rl));
-	if (old_size == new_size)
-		return rl;
-
-	new_rl = ntfs_malloc_nofs(new_size);
-	if (unlikely(!new_rl))
-		return ERR_PTR(-ENOMEM);
-
-	if (likely(rl != NULL)) {
-		if (unlikely(old_size > new_size))
-			old_size = new_size;
-		memcpy(new_rl, rl, old_size);
-		ntfs_free(rl);
-	}
-	return new_rl;
-}
-
-/**
- * ntfs_are_rl_mergeable - test if two runlists can be joined together
- * @dst:	original runlist
- * @src:	new runlist to test for mergeability with @dst
- *
- * Test if two runlists can be joined together. For this, their VCNs and LCNs
- * must be adjacent.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * Return: TRUE   Success, the runlists can be merged.
- *	   FALSE  Failure, the runlists cannot be merged.
- */
-static inline BOOL ntfs_are_rl_mergeable(runlist_element *dst,
-		runlist_element *src)
-{
-	BUG_ON(!dst);
-	BUG_ON(!src);
-
-	if ((dst->lcn < 0) || (src->lcn < 0))     /* Are we merging holes? */
-		return FALSE;
-	if ((dst->lcn + dst->length) != src->lcn) /* Are the runs contiguous? */
-		return FALSE;
-	if ((dst->vcn + dst->length) != src->vcn) /* Are the runs misaligned? */
-		return FALSE;
-
-	return TRUE;
-}
-
-/**
- * __ntfs_rl_merge - merge two runlists without testing if they can be merged
- * @dst:	original, destination runlist
- * @src:	new runlist to merge with @dst
- *
- * Merge the two runlists, writing into the destination runlist @dst. The
- * caller must make sure the runlists can be merged or this will corrupt the
- * destination runlist.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- */
-static inline void __ntfs_rl_merge(runlist_element *dst, runlist_element *src)
-{
-	dst->length += src->length;
-}
-
-/**
- * ntfs_rl_merge - test if two runlists can be joined together and merge them
- * @dst:	original, destination runlist
- * @src:	new runlist to merge with @dst
- *
- * Test if two runlists can be joined together. For this, their VCNs and LCNs
- * must be adjacent. If they can be merged, perform the merge, writing into
- * the destination runlist @dst.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * Return: TRUE   Success, the runlists have been merged.
- *	   FALSE  Failure, the runlists cannot be merged and have not been
- *		  modified.
- */
-static inline BOOL ntfs_rl_merge(runlist_element *dst, runlist_element *src)
-{
-	BOOL merge = ntfs_are_rl_mergeable(dst, src);
-
-	if (merge)
-		__ntfs_rl_merge(dst, src);
-	return merge;
-}
-
-/**
- * ntfs_rl_append - append a runlist after a given element
- * @dst:	original runlist to be worked on
- * @dsize:	number of elements in @dst (including end marker)
- * @src:	runlist to be inserted into @dst
- * @ssize:	number of elements in @src (excluding end marker)
- * @loc:	append the new runlist @src after this element in @dst
- *
- * Append the runlist @src after element @loc in @dst.  Merge the right end of
- * the new runlist, if necessary. Adjust the size of the hole before the
- * appended runlist.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @dst and @src are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- *	-ENOMEM	- Not enough memory to allocate runlist array.
- *	-EINVAL	- Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_append(runlist_element *dst,
-		int dsize, runlist_element *src, int ssize, int loc)
-{
-	BOOL right;
-	int magic;
-
-	BUG_ON(!dst);
-	BUG_ON(!src);
-
-	/* First, check if the right hand end needs merging. */
-	right = ntfs_are_rl_mergeable(src + ssize - 1, dst + loc + 1);
-
-	/* Space required: @dst size + @src size, less one if we merged. */
-	dst = ntfs_rl_realloc(dst, dsize, dsize + ssize - right);
-	if (IS_ERR(dst))
-		return dst;
-	/*
-	 * We are guaranteed to succeed from here so can start modifying the
-	 * original runlists.
-	 */
-
-	/* First, merge the right hand end, if necessary. */
-	if (right)
-		__ntfs_rl_merge(src + ssize - 1, dst + loc + 1);
-
-	magic = loc + ssize;
-
-	/* Move the tail of @dst out of the way, then copy in @src. */
-	ntfs_rl_mm(dst, magic + 1, loc + 1 + right, dsize - loc - 1 - right);
-	ntfs_rl_mc(dst, loc + 1, src, 0, ssize);
-
-	/* Adjust the size of the preceding hole. */
-	dst[loc].length = dst[loc + 1].vcn - dst[loc].vcn;
-
-	/* We may have changed the length of the file, so fix the end marker */
-	if (dst[magic + 1].lcn == LCN_ENOENT)
-		dst[magic + 1].vcn = dst[magic].vcn + dst[magic].length;
-
-	return dst;
-}
-
-/**
- * ntfs_rl_insert - insert a runlist into another
- * @dst:	original runlist to be worked on
- * @dsize:	number of elements in @dst (including end marker)
- * @src:	new runlist to be inserted
- * @ssize:	number of elements in @src (excluding end marker)
- * @loc:	insert the new runlist @src before this element in @dst
- *
- * Insert the runlist @src before element @loc in the runlist @dst. Merge the
- * left end of the new runlist, if necessary. Adjust the size of the hole
- * after the inserted runlist.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @dst and @src are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- *	-ENOMEM	- Not enough memory to allocate runlist array.
- *	-EINVAL	- Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_insert(runlist_element *dst,
-		int dsize, runlist_element *src, int ssize, int loc)
-{
-	BOOL left = FALSE;
-	BOOL disc = FALSE;	/* Discontinuity */
-	BOOL hole = FALSE;	/* Following a hole */
-	int magic;
-
-	BUG_ON(!dst);
-	BUG_ON(!src);
-
-	/* disc => Discontinuity between the end of @dst and the start of @src.
-	 *	   This means we might need to insert a hole.
-	 * hole => @dst ends with a hole or an unmapped region which we can
-	 *	   extend to match the discontinuity. */
-	if (loc == 0)
-		disc = (src[0].vcn > 0);
-	else {
-		s64 merged_length;
-
-		left = ntfs_are_rl_mergeable(dst + loc - 1, src);
-
-		merged_length = dst[loc - 1].length;
-		if (left)
-			merged_length += src->length;
-
-		disc = (src[0].vcn > dst[loc - 1].vcn + merged_length);
-		if (disc)
-			hole = (dst[loc - 1].lcn == LCN_HOLE);
-	}
-
-	/* Space required: @dst size + @src size, less one if we merged, plus
-	 * one if there was a discontinuity, less one for a trailing hole. */
-	dst = ntfs_rl_realloc(dst, dsize, dsize + ssize - left + disc - hole);
-	if (IS_ERR(dst))
-		return dst;
-	/*
-	 * We are guaranteed to succeed from here so can start modifying the
-	 * original runlist.
-	 */
-
-	if (left)
-		__ntfs_rl_merge(dst + loc - 1, src);
-
-	magic = loc + ssize - left + disc - hole;
-
-	/* Move the tail of @dst out of the way, then copy in @src. */
-	ntfs_rl_mm(dst, magic, loc, dsize - loc);
-	ntfs_rl_mc(dst, loc + disc - hole, src, left, ssize - left);
-
-	/* Adjust the VCN of the last run ... */
-	if (dst[magic].lcn <= LCN_HOLE)
-		dst[magic].vcn = dst[magic - 1].vcn + dst[magic - 1].length;
-	/* ... and the length. */
-	if (dst[magic].lcn == LCN_HOLE || dst[magic].lcn == LCN_RL_NOT_MAPPED)
-		dst[magic].length = dst[magic + 1].vcn - dst[magic].vcn;
-
-	/* Writing beyond the end of the file and there's a discontinuity. */
-	if (disc) {
-		if (hole)
-			dst[loc - 1].length = dst[loc].vcn - dst[loc - 1].vcn;
-		else {
-			if (loc > 0) {
-				dst[loc].vcn = dst[loc - 1].vcn +
-						dst[loc - 1].length;
-				dst[loc].length = dst[loc + 1].vcn -
-						dst[loc].vcn;
-			} else {
-				dst[loc].vcn = 0;
-				dst[loc].length = dst[loc + 1].vcn;
-			}
-			dst[loc].lcn = LCN_RL_NOT_MAPPED;
-		}
-
-		magic += hole;
-
-		if (dst[magic].lcn == LCN_ENOENT)
-			dst[magic].vcn = dst[magic - 1].vcn +
-					dst[magic - 1].length;
-	}
-	return dst;
-}
-
-/**
- * ntfs_rl_replace - overwrite a runlist element with another runlist
- * @dst:	original runlist to be worked on
- * @dsize:	number of elements in @dst (including end marker)
- * @src:	new runlist to be inserted
- * @ssize:	number of elements in @src (excluding end marker)
- * @loc:	index in runlist @dst to overwrite with @src
- *
- * Replace the runlist element @dst at @loc with @src. Merge the left and
- * right ends of the inserted runlist, if necessary.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @dst and @src are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- *	-ENOMEM	- Not enough memory to allocate runlist array.
- *	-EINVAL	- Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_replace(runlist_element *dst,
-		int dsize, runlist_element *src, int ssize, int loc)
-{
-	BOOL left = FALSE;
-	BOOL right;
-	int magic;
-
-	BUG_ON(!dst);
-	BUG_ON(!src);
-
-	/* First, merge the left and right ends, if necessary. */
-	right = ntfs_are_rl_mergeable(src + ssize - 1, dst + loc + 1);
-	if (loc > 0)
-		left = ntfs_are_rl_mergeable(dst + loc - 1, src);
-
-	/* Allocate some space. We'll need less if the left, right, or both
-	 * ends were merged. */
-	dst = ntfs_rl_realloc(dst, dsize, dsize + ssize - left - right);
-	if (IS_ERR(dst))
-		return dst;
-	/*
-	 * We are guaranteed to succeed from here so can start modifying the
-	 * original runlists.
-	 */
-	if (right)
-		__ntfs_rl_merge(src + ssize - 1, dst + loc + 1);
-	if (left)
-		__ntfs_rl_merge(dst + loc - 1, src);
-
-	/* FIXME: What does this mean? (AIA) */
-	magic = loc + ssize - left;
-
-	/* Move the tail of @dst out of the way, then copy in @src. */
-	ntfs_rl_mm(dst, magic, loc + right + 1, dsize - loc - right - 1);
-	ntfs_rl_mc(dst, loc, src, left, ssize - left);
-
-	/* We may have changed the length of the file, so fix the end marker */
-	if (dst[magic].lcn == LCN_ENOENT)
-		dst[magic].vcn = dst[magic - 1].vcn + dst[magic - 1].length;
-	return dst;
-}
-
-/**
- * ntfs_rl_split - insert a runlist into the centre of a hole
- * @dst:	original runlist to be worked on
- * @dsize:	number of elements in @dst (including end marker)
- * @src:	new runlist to be inserted
- * @ssize:	number of elements in @src (excluding end marker)
- * @loc:	index in runlist @dst at which to split and insert @src
- *
- * Split the runlist @dst at @loc into two and insert @new in between the two
- * fragments. No merging of runlists is necessary. Adjust the size of the
- * holes either side.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @dst and @src are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- *	-ENOMEM	- Not enough memory to allocate runlist array.
- *	-EINVAL	- Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_split(runlist_element *dst, int dsize,
-		runlist_element *src, int ssize, int loc)
-{
-	BUG_ON(!dst);
-	BUG_ON(!src);
-
-	/* Space required: @dst size + @src size + one new hole. */
-	dst = ntfs_rl_realloc(dst, dsize, dsize + ssize + 1);
-	if (IS_ERR(dst))
-		return dst;
-	/*
-	 * We are guaranteed to succeed from here so can start modifying the
-	 * original runlists.
-	 */
-
-	/* Move the tail of @dst out of the way, then copy in @src. */
-	ntfs_rl_mm(dst, loc + 1 + ssize, loc, dsize - loc);
-	ntfs_rl_mc(dst, loc + 1, src, 0, ssize);
-
-	/* Adjust the size of the holes either size of @src. */
-	dst[loc].length		= dst[loc+1].vcn       - dst[loc].vcn;
-	dst[loc+ssize+1].vcn    = dst[loc+ssize].vcn   + dst[loc+ssize].length;
-	dst[loc+ssize+1].length = dst[loc+ssize+2].vcn - dst[loc+ssize+1].vcn;
-
-	return dst;
-}
-
-/**
- * ntfs_merge_runlists - merge two runlists into one
- * @drl:	original runlist to be worked on
- * @srl:	new runlist to be merged into @drl
- *
- * First we sanity check the two runlists @srl and @drl to make sure that they
- * are sensible and can be merged. The runlist @srl must be either after the
- * runlist @drl or completely within a hole (or unmapped region) in @drl.
- *
- * It is up to the caller to serialize access to the runlists @drl and @srl.
- *
- * Merging of runlists is necessary in two cases:
- *   1. When attribute lists are used and a further extent is being mapped.
- *   2. When new clusters are allocated to fill a hole or extend a file.
- *
- * There are four possible ways @srl can be merged. It can:
- *	- be inserted at the beginning of a hole,
- *	- split the hole in two and be inserted between the two fragments,
- *	- be appended at the end of a hole, or it can
- *	- replace the whole hole.
- * It can also be appended to the end of the runlist, which is just a variant
- * of the insert case.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @drl and @srl are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- *	-ENOMEM	- Not enough memory to allocate runlist array.
- *	-EINVAL	- Invalid parameters were passed in.
- *	-ERANGE	- The runlists overlap and cannot be merged.
- */
-runlist_element *ntfs_merge_runlists(runlist_element *drl,
-		runlist_element *srl)
-{
-	int di, si;		/* Current index into @[ds]rl. */
-	int sstart;		/* First index with lcn > LCN_RL_NOT_MAPPED. */
-	int dins;		/* Index into @drl at which to insert @srl. */
-	int dend, send;		/* Last index into @[ds]rl. */
-	int dfinal, sfinal;	/* The last index into @[ds]rl with
-				   lcn >= LCN_HOLE. */
-	int marker = 0;
-	VCN marker_vcn = 0;
-
-#ifdef DEBUG
-	ntfs_debug("dst:");
-	ntfs_debug_dump_runlist(drl);
-	ntfs_debug("src:");
-	ntfs_debug_dump_runlist(srl);
-#endif
-
-	/* Check for silly calling... */
-	if (unlikely(!srl))
-		return drl;
-	if (IS_ERR(srl) || IS_ERR(drl))
-		return ERR_PTR(-EINVAL);
-
-	/* Check for the case where the first mapping is being done now. */
-	if (unlikely(!drl)) {
-		drl = srl;
-		/* Complete the source runlist if necessary. */
-		if (unlikely(drl[0].vcn)) {
-			/* Scan to the end of the source runlist. */
-			for (dend = 0; likely(drl[dend].length); dend++)
-				;
-			drl = ntfs_rl_realloc(drl, dend, dend + 1);
-			if (IS_ERR(drl))
-				return drl;
-			/* Insert start element at the front of the runlist. */
-			ntfs_rl_mm(drl, 1, 0, dend);
-			drl[0].vcn = 0;
-			drl[0].lcn = LCN_RL_NOT_MAPPED;
-			drl[0].length = drl[1].vcn;
-		}
-		goto finished;
-	}
-
-	si = di = 0;
-
-	/* Skip any unmapped start element(s) in the source runlist. */
-	while (srl[si].length && srl[si].lcn < (LCN)LCN_HOLE)
-		si++;
-
-	/* Can't have an entirely unmapped source runlist. */
-	BUG_ON(!srl[si].length);
-
-	/* Record the starting points. */
-	sstart = si;
-
-	/*
-	 * Skip forward in @drl until we reach the position where @srl needs to
-	 * be inserted. If we reach the end of @drl, @srl just needs to be
-	 * appended to @drl.
-	 */
-	for (; drl[di].length; di++) {
-		if (drl[di].vcn + drl[di].length > srl[sstart].vcn)
-			break;
-	}
-	dins = di;
-
-	/* Sanity check for illegal overlaps. */
-	if ((drl[di].vcn == srl[si].vcn) && (drl[di].lcn >= 0) &&
-			(srl[si].lcn >= 0)) {
-		ntfs_error(NULL, "Run lists overlap. Cannot merge!");
-		return ERR_PTR(-ERANGE);
-	}
-
-	/* Scan to the end of both runlists in order to know their sizes. */
-	for (send = si; srl[send].length; send++)
-		;
-	for (dend = di; drl[dend].length; dend++)
-		;
-
-	if (srl[send].lcn == (LCN)LCN_ENOENT)
-		marker_vcn = srl[marker = send].vcn;
-
-	/* Scan to the last element with lcn >= LCN_HOLE. */
-	for (sfinal = send; sfinal >= 0 && srl[sfinal].lcn < LCN_HOLE; sfinal--)
-		;
-	for (dfinal = dend; dfinal >= 0 && drl[dfinal].lcn < LCN_HOLE; dfinal--)
-		;
-
-	{
-	BOOL start;
-	BOOL finish;
-	int ds = dend + 1;		/* Number of elements in drl & srl */
-	int ss = sfinal - sstart + 1;
-
-	start  = ((drl[dins].lcn <  LCN_RL_NOT_MAPPED) ||    /* End of file   */
-		  (drl[dins].vcn == srl[sstart].vcn));	     /* Start of hole */
-	finish = ((drl[dins].lcn >= LCN_RL_NOT_MAPPED) &&    /* End of file   */
-		 ((drl[dins].vcn + drl[dins].length) <=      /* End of hole   */
-		  (srl[send - 1].vcn + srl[send - 1].length)));
-
-	/* Or we'll lose an end marker */
-	if (start && finish && (drl[dins].length == 0))
-		ss++;
-	if (marker && (drl[dins].vcn + drl[dins].length > srl[send - 1].vcn))
-		finish = FALSE;
-#if 0
-	ntfs_debug("dfinal = %i, dend = %i", dfinal, dend);
-	ntfs_debug("sstart = %i, sfinal = %i, send = %i", sstart, sfinal, send);
-	ntfs_debug("start = %i, finish = %i", start, finish);
-	ntfs_debug("ds = %i, ss = %i, dins = %i", ds, ss, dins);
-#endif
-	if (start) {
-		if (finish)
-			drl = ntfs_rl_replace(drl, ds, srl + sstart, ss, dins);
-		else
-			drl = ntfs_rl_insert(drl, ds, srl + sstart, ss, dins);
-	} else {
-		if (finish)
-			drl = ntfs_rl_append(drl, ds, srl + sstart, ss, dins);
-		else
-			drl = ntfs_rl_split(drl, ds, srl + sstart, ss, dins);
-	}
-	if (IS_ERR(drl)) {
-		ntfs_error(NULL, "Merge failed.");
-		return drl;
-	}
-	ntfs_free(srl);
-	if (marker) {
-		ntfs_debug("Triggering marker code.");
-		for (ds = dend; drl[ds].length; ds++)
-			;
-		/* We only need to care if @srl ended after @drl. */
-		if (drl[ds].vcn <= marker_vcn) {
-			int slots = 0;
-
-			if (drl[ds].vcn == marker_vcn) {
-				ntfs_debug("Old marker = 0x%llx, replacing "
-						"with LCN_ENOENT.",
-						(unsigned long long)
-						drl[ds].lcn);
-				drl[ds].lcn = (LCN)LCN_ENOENT;
-				goto finished;
-			}
-			/*
-			 * We need to create an unmapped runlist element in
-			 * @drl or extend an existing one before adding the
-			 * ENOENT terminator.
-			 */
-			if (drl[ds].lcn == (LCN)LCN_ENOENT) {
-				ds--;
-				slots = 1;
-			}
-			if (drl[ds].lcn != (LCN)LCN_RL_NOT_MAPPED) {
-				/* Add an unmapped runlist element. */
-				if (!slots) {
-					/* FIXME/TODO: We need to have the
-					 * extra memory already! (AIA) */
-					drl = ntfs_rl_realloc(drl, ds, ds + 2);
-					if (!drl)
-						goto critical_error;
-					slots = 2;
-				}
-				ds++;
-				/* Need to set vcn if it isn't set already. */
-				if (slots != 1)
-					drl[ds].vcn = drl[ds - 1].vcn +
-							drl[ds - 1].length;
-				drl[ds].lcn = (LCN)LCN_RL_NOT_MAPPED;
-				/* We now used up a slot. */
-				slots--;
-			}
-			drl[ds].length = marker_vcn - drl[ds].vcn;
-			/* Finally add the ENOENT terminator. */
-			ds++;
-			if (!slots) {
-				/* FIXME/TODO: We need to have the extra
-				 * memory already! (AIA) */
-				drl = ntfs_rl_realloc(drl, ds, ds + 1);
-				if (!drl)
-					goto critical_error;
-			}
-			drl[ds].vcn = marker_vcn;
-			drl[ds].lcn = (LCN)LCN_ENOENT;
-			drl[ds].length = (s64)0;
-		}
-	}
-	}
-
-finished:
-	/* The merge was completed successfully. */
-	ntfs_debug("Merged runlist:");
-	ntfs_debug_dump_runlist(drl);
-	return drl;
-
-critical_error:
-	/* Critical error! We cannot afford to fail here. */
-	ntfs_error(NULL, "Critical error! Not enough memory.");
-	panic("NTFS: Cannot continue.");
-}
-
-/**
- * decompress_mapping_pairs - convert mapping pairs array to runlist
- * @vol:	ntfs volume on which the attribute resides
- * @attr:	attribute record whose mapping pairs array to decompress
- * @old_rl:	optional runlist in which to insert @attr's runlist
- *
- * It is up to the caller to serialize access to the runlist @old_rl.
- *
- * Decompress the attribute @attr's mapping pairs array into a runlist. On
- * success, return the decompressed runlist.
- *
- * If @old_rl is not NULL, decompressed runlist is inserted into the
- * appropriate place in @old_rl and the resultant, combined runlist is
- * returned. The original @old_rl is deallocated.
- *
- * On error, return -errno. @old_rl is left unmodified in that case.
- *
- * The following error codes are defined:
- *	-ENOMEM	- Not enough memory to allocate runlist array.
- *	-EIO	- Corrupt runlist.
- *	-EINVAL	- Invalid parameters were passed in.
- *	-ERANGE	- The two runlists overlap.
- *
- * FIXME: For now we take the conceptionally simplest approach of creating the
- * new runlist disregarding the already existing one and then splicing the
- * two into one, if that is possible (we check for overlap and discard the new
- * runlist if overlap present before returning ERR_PTR(-ERANGE)).
- */
-runlist_element *decompress_mapping_pairs(const ntfs_volume *vol,
-		const ATTR_RECORD *attr, runlist_element *old_rl)
-{
-	VCN vcn;		/* Current vcn. */
-	LCN lcn;		/* Current lcn. */
-	s64 deltaxcn;		/* Change in [vl]cn. */
-	runlist_element *rl;	/* The output runlist. */
-	u8 *buf;		/* Current position in mapping pairs array. */
-	u8 *attr_end;		/* End of attribute. */
-	int rlsize;		/* Size of runlist buffer. */
-	u16 rlpos;		/* Current runlist position in units of
-				   runlist_elements. */
-	u8 b;			/* Current byte offset in buf. */
-
-#ifdef DEBUG
-	/* Make sure attr exists and is non-resident. */
-	if (!attr || !attr->non_resident || sle64_to_cpu(
-			attr->data.non_resident.lowest_vcn) < (VCN)0) {
-		ntfs_error(vol->sb, "Invalid arguments.");
-		return ERR_PTR(-EINVAL);
-	}
-#endif
-	/* Start at vcn = lowest_vcn and lcn 0. */
-	vcn = sle64_to_cpu(attr->data.non_resident.lowest_vcn);
-	lcn = 0;
-	/* Get start of the mapping pairs array. */
-	buf = (u8*)attr + le16_to_cpu(
-			attr->data.non_resident.mapping_pairs_offset);
-	attr_end = (u8*)attr + le32_to_cpu(attr->length);
-	if (unlikely(buf < (u8*)attr || buf > attr_end)) {
-		ntfs_error(vol->sb, "Corrupt attribute.");
-		return ERR_PTR(-EIO);
-	}
-	/* Current position in runlist array. */
-	rlpos = 0;
-	/* Allocate first page and set current runlist size to one page. */
-	rl = ntfs_malloc_nofs(rlsize = PAGE_SIZE);
-	if (unlikely(!rl))
-		return ERR_PTR(-ENOMEM);
-	/* Insert unmapped starting element if necessary. */
-	if (vcn) {
-		rl->vcn = (VCN)0;
-		rl->lcn = (LCN)LCN_RL_NOT_MAPPED;
-		rl->length = vcn;
-		rlpos++;
-	}
-	while (buf < attr_end && *buf) {
-		/*
-		 * Allocate more memory if needed, including space for the
-		 * not-mapped and terminator elements. ntfs_malloc_nofs()
-		 * operates on whole pages only.
-		 */
-		if (((rlpos + 3) * sizeof(*old_rl)) > rlsize) {
-			runlist_element *rl2;
-
-			rl2 = ntfs_malloc_nofs(rlsize + (int)PAGE_SIZE);
-			if (unlikely(!rl2)) {
-				ntfs_free(rl);
-				return ERR_PTR(-ENOMEM);
-			}
-			memcpy(rl2, rl, rlsize);
-			ntfs_free(rl);
-			rl = rl2;
-			rlsize += PAGE_SIZE;
-		}
-		/* Enter the current vcn into the current runlist element. */
-		rl[rlpos].vcn = vcn;
-		/*
-		 * Get the change in vcn, i.e. the run length in clusters.
-		 * Doing it this way ensures that we signextend negative values.
-		 * A negative run length doesn't make any sense, but hey, I
-		 * didn't make up the NTFS specs and Windows NT4 treats the run
-		 * length as a signed value so that's how it is...
-		 */
-		b = *buf & 0xf;
-		if (b) {
-			if (unlikely(buf + b > attr_end))
-				goto io_error;
-			for (deltaxcn = (s8)buf[b--]; b; b--)
-				deltaxcn = (deltaxcn << 8) + buf[b];
-		} else { /* The length entry is compulsory. */
-			ntfs_error(vol->sb, "Missing length entry in mapping "
-					"pairs array.");
-			deltaxcn = (s64)-1;
-		}
-		/*
-		 * Assume a negative length to indicate data corruption and
-		 * hence clean-up and return NULL.
-		 */
-		if (unlikely(deltaxcn < 0)) {
-			ntfs_error(vol->sb, "Invalid length in mapping pairs "
-					"array.");
-			goto err_out;
-		}
-		/*
-		 * Enter the current run length into the current runlist
-		 * element.
-		 */
-		rl[rlpos].length = deltaxcn;
-		/* Increment the current vcn by the current run length. */
-		vcn += deltaxcn;
-		/*
-		 * There might be no lcn change at all, as is the case for
-		 * sparse clusters on NTFS 3.0+, in which case we set the lcn
-		 * to LCN_HOLE.
-		 */
-		if (!(*buf & 0xf0))
-			rl[rlpos].lcn = (LCN)LCN_HOLE;
-		else {
-			/* Get the lcn change which really can be negative. */
-			u8 b2 = *buf & 0xf;
-			b = b2 + ((*buf >> 4) & 0xf);
-			if (buf + b > attr_end)
-				goto io_error;
-			for (deltaxcn = (s8)buf[b--]; b > b2; b--)
-				deltaxcn = (deltaxcn << 8) + buf[b];
-			/* Change the current lcn to its new value. */
-			lcn += deltaxcn;
-#ifdef DEBUG
-			/*
-			 * On NTFS 1.2-, apparently can have lcn == -1 to
-			 * indicate a hole. But we haven't verified ourselves
-			 * whether it is really the lcn or the deltaxcn that is
-			 * -1. So if either is found give us a message so we
-			 * can investigate it further!
-			 */
-			if (vol->major_ver < 3) {
-				if (unlikely(deltaxcn == (LCN)-1))
-					ntfs_error(vol->sb, "lcn delta == -1");
-				if (unlikely(lcn == (LCN)-1))
-					ntfs_error(vol->sb, "lcn == -1");
-			}
-#endif
-			/* Check lcn is not below -1. */
-			if (unlikely(lcn < (LCN)-1)) {
-				ntfs_error(vol->sb, "Invalid LCN < -1 in "
-						"mapping pairs array.");
-				goto err_out;
-			}
-			/* Enter the current lcn into the runlist element. */
-			rl[rlpos].lcn = lcn;
-		}
-		/* Get to the next runlist element. */
-		rlpos++;
-		/* Increment the buffer position to the next mapping pair. */
-		buf += (*buf & 0xf) + ((*buf >> 4) & 0xf) + 1;
-	}
-	if (unlikely(buf >= attr_end))
-		goto io_error;
-	/*
-	 * If there is a highest_vcn specified, it must be equal to the final
-	 * vcn in the runlist - 1, or something has gone badly wrong.
-	 */
-	deltaxcn = sle64_to_cpu(attr->data.non_resident.highest_vcn);
-	if (unlikely(deltaxcn && vcn - 1 != deltaxcn)) {
-mpa_err:
-		ntfs_error(vol->sb, "Corrupt mapping pairs array in "
-				"non-resident attribute.");
-		goto err_out;
-	}
-	/* Setup not mapped runlist element if this is the base extent. */
-	if (!attr->data.non_resident.lowest_vcn) {
-		VCN max_cluster;
-
-		max_cluster = (sle64_to_cpu(
-				attr->data.non_resident.allocated_size) +
-				vol->cluster_size - 1) >>
-				vol->cluster_size_bits;
-		/*
-		 * If there is a difference between the highest_vcn and the
-		 * highest cluster, the runlist is either corrupt or, more
-		 * likely, there are more extents following this one.
-		 */
-		if (deltaxcn < --max_cluster) {
-			ntfs_debug("More extents to follow; deltaxcn = 0x%llx, "
-					"max_cluster = 0x%llx",
-					(unsigned long long)deltaxcn,
-					(unsigned long long)max_cluster);
-			rl[rlpos].vcn = vcn;
-			vcn += rl[rlpos].length = max_cluster - deltaxcn;
-			rl[rlpos].lcn = (LCN)LCN_RL_NOT_MAPPED;
-			rlpos++;
-		} else if (unlikely(deltaxcn > max_cluster)) {
-			ntfs_error(vol->sb, "Corrupt attribute. deltaxcn = "
-					"0x%llx, max_cluster = 0x%llx",
-					(unsigned long long)deltaxcn,
-					(unsigned long long)max_cluster);
-			goto mpa_err;
-		}
-		rl[rlpos].lcn = (LCN)LCN_ENOENT;
-	} else /* Not the base extent. There may be more extents to follow. */
-		rl[rlpos].lcn = (LCN)LCN_RL_NOT_MAPPED;
-
-	/* Setup terminating runlist element. */
-	rl[rlpos].vcn = vcn;
-	rl[rlpos].length = (s64)0;
-	/* If no existing runlist was specified, we are done. */
-	if (!old_rl) {
-		ntfs_debug("Mapping pairs array successfully decompressed:");
-		ntfs_debug_dump_runlist(rl);
-		return rl;
-	}
-	/* Now combine the new and old runlists checking for overlaps. */
-	old_rl = ntfs_merge_runlists(old_rl, rl);
-	if (likely(!IS_ERR(old_rl)))
-		return old_rl;
-	ntfs_free(rl);
-	ntfs_error(vol->sb, "Failed to merge runlists.");
-	return old_rl;
-io_error:
-	ntfs_error(vol->sb, "Corrupt attribute.");
-err_out:
-	ntfs_free(rl);
-	return ERR_PTR(-EIO);
-}
+//#include "dir.h"
 
 /**
  * ntfs_map_runlist - map (a part of) a runlist of an ntfs inode
@@ -993,63 +86,6 @@
 }
 
 /**
- * ntfs_vcn_to_lcn - convert a vcn into a lcn given a runlist
- * @rl:		runlist to use for conversion
- * @vcn:	vcn to convert
- *
- * Convert the virtual cluster number @vcn of an attribute into a logical
- * cluster number (lcn) of a device using the runlist @rl to map vcns to their
- * corresponding lcns.
- *
- * It is up to the caller to serialize access to the runlist @rl.
- *
- * Since lcns must be >= 0, we use negative return values with special meaning:
- *
- * Return value			Meaning / Description
- * ==================================================
- *  -1 = LCN_HOLE		Hole / not allocated on disk.
- *  -2 = LCN_RL_NOT_MAPPED	This is part of the runlist which has not been
- *				inserted into the runlist yet.
- *  -3 = LCN_ENOENT		There is no such vcn in the attribute.
- *
- * Locking: - The caller must have locked the runlist (for reading or writing).
- *	    - This function does not touch the lock.
- */
-LCN ntfs_vcn_to_lcn(const runlist_element *rl, const VCN vcn)
-{
-	int i;
-
-	BUG_ON(vcn < 0);
-	/*
-	 * If rl is NULL, assume that we have found an unmapped runlist. The
-	 * caller can then attempt to map it and fail appropriately if
-	 * necessary.
-	 */
-	if (unlikely(!rl))
-		return (LCN)LCN_RL_NOT_MAPPED;
-
-	/* Catch out of lower bounds vcn. */
-	if (unlikely(vcn < rl[0].vcn))
-		return (LCN)LCN_ENOENT;
-
-	for (i = 0; likely(rl[i].length); i++) {
-		if (unlikely(vcn < rl[i+1].vcn)) {
-			if (likely(rl[i].lcn >= (LCN)0))
-				return rl[i].lcn + (vcn - rl[i].vcn);
-			return rl[i].lcn;
-		}
-	}
-	/*
-	 * The terminator element is setup to the correct value, i.e. one of
-	 * LCN_HOLE, LCN_RL_NOT_MAPPED, or LCN_ENOENT.
-	 */
-	if (likely(rl[i].lcn < (LCN)0))
-		return rl[i].lcn;
-	/* Just in case... We could replace this with BUG() some day. */
-	return (LCN)LCN_ENOENT;
-}
-
-/**
  * ntfs_find_vcn - find a vcn in the runlist described by an ntfs inode
  * @ni:		ntfs inode describing the runlist to search
  * @vcn:	vcn to find
@@ -1861,6 +897,11 @@
 		/* Sanity checks are performed elsewhere. */
 		ctx->attr = (ATTR_RECORD*)((u8*)ctx->mrec +
 				le16_to_cpu(ctx->mrec->attrs_offset));
+		/*
+		 * This needs resetting due to ntfs_external_attr_find() which
+		 * can leave it set despite having zeroed ctx->base_ntfs_ino.
+		 */
+		ctx->al_entry = NULL;
 		return;
 	} /* Attribute list. */
 	if (ctx->ntfs_ino != ctx->base_ntfs_ino)
diff -Nru a/fs/ntfs/attrib.h b/fs/ntfs/attrib.h
--- a/fs/ntfs/attrib.h	2004-10-19 10:13:11 +01:00
+++ b/fs/ntfs/attrib.h	2004-10-19 10:13:11 +01:00
@@ -24,23 +24,11 @@
 #ifndef _LINUX_NTFS_ATTRIB_H
 #define _LINUX_NTFS_ATTRIB_H
 
-#include <linux/fs.h>
-
 #include "endian.h"
 #include "types.h"
 #include "layout.h"
-
-static inline void init_runlist(runlist *rl)
-{
-	rl->rl = NULL;
-	init_rwsem(&rl->lock);
-}
-
-typedef enum {
-	LCN_HOLE		= -1,	/* Keep this as highest value or die! */
-	LCN_RL_NOT_MAPPED	= -2,
-	LCN_ENOENT		= -3,
-} LCN_SPECIAL_VALUES;
+#include "inode.h"
+#include "runlist.h"
 
 /**
  * ntfs_attr_search_ctx - used in attribute search functions
@@ -71,12 +59,7 @@
 	ATTR_RECORD *base_attr;
 } ntfs_attr_search_ctx;
 
-extern runlist_element *decompress_mapping_pairs(const ntfs_volume *vol,
-		const ATTR_RECORD *attr, runlist_element *old_rl);
-
 extern int ntfs_map_runlist(ntfs_inode *ni, VCN vcn);
-
-extern LCN ntfs_vcn_to_lcn(const runlist_element *rl, const VCN vcn);
 
 extern runlist_element *ntfs_find_vcn(ntfs_inode *ni, const VCN vcn,
 		const BOOL need_write);
diff -Nru a/fs/ntfs/runlist.c b/fs/ntfs/runlist.c
--- /dev/null	Wed Dec 31 16:00:00 196900
+++ b/fs/ntfs/runlist.c	2004-10-19 10:13:11 +01:00
@@ -0,0 +1,986 @@
+/**
+ * runlist.c - NTFS runlist handling code.  Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2004 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ *
+ * This program/include file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as published
+ * by the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program/include file is distributed in the hope that it will be
+ * useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program (in the main directory of the Linux-NTFS
+ * distribution in the file COPYING); if not, write to the Free Software
+ * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include "ntfs.h"
+#include "dir.h"
+
+/**
+ * ntfs_rl_mm - runlist memmove
+ *
+ * It is up to the caller to serialize access to the runlist @base.
+ */
+static inline void ntfs_rl_mm(runlist_element *base, int dst, int src,
+		int size)
+{
+	if (likely((dst != src) && (size > 0)))
+		memmove(base + dst, base + src, size * sizeof (*base));
+}
+
+/**
+ * ntfs_rl_mc - runlist memory copy
+ *
+ * It is up to the caller to serialize access to the runlists @dstbase and
+ * @srcbase.
+ */
+static inline void ntfs_rl_mc(runlist_element *dstbase, int dst,
+		runlist_element *srcbase, int src, int size)
+{
+	if (likely(size > 0))
+		memcpy(dstbase + dst, srcbase + src, size * sizeof(*dstbase));
+}
+
+/**
+ * ntfs_rl_realloc - Reallocate memory for runlists
+ * @rl:		original runlist
+ * @old_size:	number of runlist elements in the original runlist @rl
+ * @new_size:	number of runlist elements we need space for
+ *
+ * As the runlists grow, more memory will be required.  To prevent the
+ * kernel having to allocate and reallocate large numbers of small bits of
+ * memory, this function returns and entire page of memory.
+ *
+ * It is up to the caller to serialize access to the runlist @rl.
+ *
+ * N.B.  If the new allocation doesn't require a different number of pages in
+ *       memory, the function will return the original pointer.
+ *
+ * On success, return a pointer to the newly allocated, or recycled, memory.
+ * On error, return -errno. The following error codes are defined:
+ *	-ENOMEM	- Not enough memory to allocate runlist array.
+ *	-EINVAL	- Invalid parameters were passed in.
+ */
+static inline runlist_element *ntfs_rl_realloc(runlist_element *rl,
+		int old_size, int new_size)
+{
+	runlist_element *new_rl;
+
+	old_size = PAGE_ALIGN(old_size * sizeof(*rl));
+	new_size = PAGE_ALIGN(new_size * sizeof(*rl));
+	if (old_size == new_size)
+		return rl;
+
+	new_rl = ntfs_malloc_nofs(new_size);
+	if (unlikely(!new_rl))
+		return ERR_PTR(-ENOMEM);
+
+	if (likely(rl != NULL)) {
+		if (unlikely(old_size > new_size))
+			old_size = new_size;
+		memcpy(new_rl, rl, old_size);
+		ntfs_free(rl);
+	}
+	return new_rl;
+}
+
+/**
+ * ntfs_are_rl_mergeable - test if two runlists can be joined together
+ * @dst:	original runlist
+ * @src:	new runlist to test for mergeability with @dst
+ *
+ * Test if two runlists can be joined together. For this, their VCNs and LCNs
+ * must be adjacent.
+ *
+ * It is up to the caller to serialize access to the runlists @dst and @src.
+ *
+ * Return: TRUE   Success, the runlists can be merged.
+ *	   FALSE  Failure, the runlists cannot be merged.
+ */
+static inline BOOL ntfs_are_rl_mergeable(runlist_element *dst,
+		runlist_element *src)
+{
+	BUG_ON(!dst);
+	BUG_ON(!src);
+
+	if ((dst->lcn < 0) || (src->lcn < 0))     /* Are we merging holes? */
+		return FALSE;
+	if ((dst->lcn + dst->length) != src->lcn) /* Are the runs contiguous? */
+		return FALSE;
+	if ((dst->vcn + dst->length) != src->vcn) /* Are the runs misaligned? */
+		return FALSE;
+
+	return TRUE;
+}
+
+/**
+ * __ntfs_rl_merge - merge two runlists without testing if they can be merged
+ * @dst:	original, destination runlist
+ * @src:	new runlist to merge with @dst
+ *
+ * Merge the two runlists, writing into the destination runlist @dst. The
+ * caller must make sure the runlists can be merged or this will corrupt the
+ * destination runlist.
+ *
+ * It is up to the caller to serialize access to the runlists @dst and @src.
+ */
+static inline void __ntfs_rl_merge(runlist_element *dst, runlist_element *src)
+{
+	dst->length += src->length;
+}
+
+/**
+ * ntfs_rl_merge - test if two runlists can be joined together and merge them
+ * @dst:	original, destination runlist
+ * @src:	new runlist to merge with @dst
+ *
+ * Test if two runlists can be joined together. For this, their VCNs and LCNs
+ * must be adjacent. If they can be merged, perform the merge, writing into
+ * the destination runlist @dst.
+ *
+ * It is up to the caller to serialize access to the runlists @dst and @src.
+ *
+ * Return: TRUE   Success, the runlists have been merged.
+ *	   FALSE  Failure, the runlists cannot be merged and have not been
+ *		  modified.
+ */
+static inline BOOL ntfs_rl_merge(runlist_element *dst, runlist_element *src)
+{
+	BOOL merge = ntfs_are_rl_mergeable(dst, src);
+
+	if (merge)
+		__ntfs_rl_merge(dst, src);
+	return merge;
+}
+
+/**
+ * ntfs_rl_append - append a runlist after a given element
+ * @dst:	original runlist to be worked on
+ * @dsize:	number of elements in @dst (including end marker)
+ * @src:	runlist to be inserted into @dst
+ * @ssize:	number of elements in @src (excluding end marker)
+ * @loc:	append the new runlist @src after this element in @dst
+ *
+ * Append the runlist @src after element @loc in @dst.  Merge the right end of
+ * the new runlist, if necessary. Adjust the size of the hole before the
+ * appended runlist.
+ *
+ * It is up to the caller to serialize access to the runlists @dst and @src.
+ *
+ * On success, return a pointer to the new, combined, runlist. Note, both
+ * runlists @dst and @src are deallocated before returning so you cannot use
+ * the pointers for anything any more. (Strictly speaking the returned runlist
+ * may be the same as @dst but this is irrelevant.)
+ *
+ * On error, return -errno. Both runlists are left unmodified. The following
+ * error codes are defined:
+ *	-ENOMEM	- Not enough memory to allocate runlist array.
+ *	-EINVAL	- Invalid parameters were passed in.
+ */
+static inline runlist_element *ntfs_rl_append(runlist_element *dst,
+		int dsize, runlist_element *src, int ssize, int loc)
+{
+	BOOL right;
+	int magic;
+
+	BUG_ON(!dst);
+	BUG_ON(!src);
+
+	/* First, check if the right hand end needs merging. */
+	right = ntfs_are_rl_mergeable(src + ssize - 1, dst + loc + 1);
+
+	/* Space required: @dst size + @src size, less one if we merged. */
+	dst = ntfs_rl_realloc(dst, dsize, dsize + ssize - right);
+	if (IS_ERR(dst))
+		return dst;
+	/*
+	 * We are guaranteed to succeed from here so can start modifying the
+	 * original runlists.
+	 */
+
+	/* First, merge the right hand end, if necessary. */
+	if (right)
+		__ntfs_rl_merge(src + ssize - 1, dst + loc + 1);
+
+	magic = loc + ssize;
+
+	/* Move the tail of @dst out of the way, then copy in @src. */
+	ntfs_rl_mm(dst, magic + 1, loc + 1 + right, dsize - loc - 1 - right);
+	ntfs_rl_mc(dst, loc + 1, src, 0, ssize);
+
+	/* Adjust the size of the preceding hole. */
+	dst[loc].length = dst[loc + 1].vcn - dst[loc].vcn;
+
+	/* We may have changed the length of the file, so fix the end marker */
+	if (dst[magic + 1].lcn == LCN_ENOENT)
+		dst[magic + 1].vcn = dst[magic].vcn + dst[magic].length;
+
+	return dst;
+}
+
+/**
+ * ntfs_rl_insert - insert a runlist into another
+ * @dst:	original runlist to be worked on
+ * @dsize:	number of elements in @dst (including end marker)
+ * @src:	new runlist to be inserted
+ * @ssize:	number of elements in @src (excluding end marker)
+ * @loc:	insert the new runlist @src before this element in @dst
+ *
+ * Insert the runlist @src before element @loc in the runlist @dst. Merge the
+ * left end of the new runlist, if necessary. Adjust the size of the hole
+ * after the inserted runlist.
+ *
+ * It is up to the caller to serialize access to the runlists @dst and @src.
+ *
+ * On success, return a pointer to the new, combined, runlist. Note, both
+ * runlists @dst and @src are deallocated before returning so you cannot use
+ * the pointers for anything any more. (Strictly speaking the returned runlist
+ * may be the same as @dst but this is irrelevant.)
+ *
+ * On error, return -errno. Both runlists are left unmodified. The following
+ * error codes are defined:
+ *	-ENOMEM	- Not enough memory to allocate runlist array.
+ *	-EINVAL	- Invalid parameters were passed in.
+ */
+static inline runlist_element *ntfs_rl_insert(runlist_element *dst,
+		int dsize, runlist_element *src, int ssize, int loc)
+{
+	BOOL left = FALSE;
+	BOOL disc = FALSE;	/* Discontinuity */
+	BOOL hole = FALSE;	/* Following a hole */
+	int magic;
+
+	BUG_ON(!dst);
+	BUG_ON(!src);
+
+	/* disc => Discontinuity between the end of @dst and the start of @src.
+	 *	   This means we might need to insert a hole.
+	 * hole => @dst ends with a hole or an unmapped region which we can
+	 *	   extend to match the discontinuity. */
+	if (loc == 0)
+		disc = (src[0].vcn > 0);
+	else {
+		s64 merged_length;
+
+		left = ntfs_are_rl_mergeable(dst + loc - 1, src);
+
+		merged_length = dst[loc - 1].length;
+		if (left)
+			merged_length += src->length;
+
+		disc = (src[0].vcn > dst[loc - 1].vcn + merged_length);
+		if (disc)
+			hole = (dst[loc - 1].lcn == LCN_HOLE);
+	}
+
+	/* Space required: @dst size + @src size, less one if we merged, plus
+	 * one if there was a discontinuity, less one for a trailing hole. */
+	dst = ntfs_rl_realloc(dst, dsize, dsize + ssize - left + disc - hole);
+	if (IS_ERR(dst))
+		return dst;
+	/*
+	 * We are guaranteed to succeed from here so can start modifying the
+	 * original runlist.
+	 */
+
+	if (left)
+		__ntfs_rl_merge(dst + loc - 1, src);
+
+	magic = loc + ssize - left + disc - hole;
+
+	/* Move the tail of @dst out of the way, then copy in @src. */
+	ntfs_rl_mm(dst, magic, loc, dsize - loc);
+	ntfs_rl_mc(dst, loc + disc - hole, src, left, ssize - left);
+
+	/* Adjust the VCN of the last run ... */
+	if (dst[magic].lcn <= LCN_HOLE)
+		dst[magic].vcn = dst[magic - 1].vcn + dst[magic - 1].length;
+	/* ... and the length. */
+	if (dst[magic].lcn == LCN_HOLE || dst[magic].lcn == LCN_RL_NOT_MAPPED)
+		dst[magic].length = dst[magic + 1].vcn - dst[magic].vcn;
+
+	/* Writing beyond the end of the file and there's a discontinuity. */
+	if (disc) {
+		if (hole)
+			dst[loc - 1].length = dst[loc].vcn - dst[loc - 1].vcn;
+		else {
+			if (loc > 0) {
+				dst[loc].vcn = dst[loc - 1].vcn +
+						dst[loc - 1].length;
+				dst[loc].length = dst[loc + 1].vcn -
+						dst[loc].vcn;
+			} else {
+				dst[loc].vcn = 0;
+				dst[loc].length = dst[loc + 1].vcn;
+			}
+			dst[loc].lcn = LCN_RL_NOT_MAPPED;
+		}
+
+		magic += hole;
+
+		if (dst[magic].lcn == LCN_ENOENT)
+			dst[magic].vcn = dst[magic - 1].vcn +
+					dst[magic - 1].length;
+	}
+	return dst;
+}
+
+/**
+ * ntfs_rl_replace - overwrite a runlist element with another runlist
+ * @dst:	original runlist to be worked on
+ * @dsize:	number of elements in @dst (including end marker)
+ * @src:	new runlist to be inserted
+ * @ssize:	number of elements in @src (excluding end marker)
+ * @loc:	index in runlist @dst to overwrite with @src
+ *
+ * Replace the runlist element @dst at @loc with @src. Merge the left and
+ * right ends of the inserted runlist, if necessary.
+ *
+ * It is up to the caller to serialize access to the runlists @dst and @src.
+ *
+ * On success, return a pointer to the new, combined, runlist. Note, both
+ * runlists @dst and @src are deallocated before returning so you cannot use
+ * the pointers for anything any more. (Strictly speaking the returned runlist
+ * may be the same as @dst but this is irrelevant.)
+ *
+ * On error, return -errno. Both runlists are left unmodified. The following
+ * error codes are defined:
+ *	-ENOMEM	- Not enough memory to allocate runlist array.
+ *	-EINVAL	- Invalid parameters were passed in.
+ */
+static inline runlist_element *ntfs_rl_replace(runlist_element *dst,
+		int dsize, runlist_element *src, int ssize, int loc)
+{
+	BOOL left = FALSE;
+	BOOL right;
+	int magic;
+
+	BUG_ON(!dst);
+	BUG_ON(!src);
+
+	/* First, merge the left and right ends, if necessary. */
+	right = ntfs_are_rl_mergeable(src + ssize - 1, dst + loc + 1);
+	if (loc > 0)
+		left = ntfs_are_rl_mergeable(dst + loc - 1, src);
+
+	/* Allocate some space. We'll need less if the left, right, or both
+	 * ends were merged. */
+	dst = ntfs_rl_realloc(dst, dsize, dsize + ssize - left - right);
+	if (IS_ERR(dst))
+		return dst;
+	/*
+	 * We are guaranteed to succeed from here so can start modifying the
+	 * original runlists.
+	 */
+	if (right)
+		__ntfs_rl_merge(src + ssize - 1, dst + loc + 1);
+	if (left)
+		__ntfs_rl_merge(dst + loc - 1, src);
+
+	/* FIXME: What does this mean? (AIA) */
+	magic = loc + ssize - left;
+
+	/* Move the tail of @dst out of the way, then copy in @src. */
+	ntfs_rl_mm(dst, magic, loc + right + 1, dsize - loc - right - 1);
+	ntfs_rl_mc(dst, loc, src, left, ssize - left);
+
+	/* We may have changed the length of the file, so fix the end marker */
+	if (dst[magic].lcn == LCN_ENOENT)
+		dst[magic].vcn = dst[magic - 1].vcn + dst[magic - 1].length;
+	return dst;
+}
+
+/**
+ * ntfs_rl_split - insert a runlist into the centre of a hole
+ * @dst:	original runlist to be worked on
+ * @dsize:	number of elements in @dst (including end marker)
+ * @src:	new runlist to be inserted
+ * @ssize:	number of elements in @src (excluding end marker)
+ * @loc:	index in runlist @dst at which to split and insert @src
+ *
+ * Split the runlist @dst at @loc into two and insert @new in between the two
+ * fragments. No merging of runlists is necessary. Adjust the size of the
+ * holes either side.
+ *
+ * It is up to the caller to serialize access to the runlists @dst and @src.
+ *
+ * On success, return a pointer to the new, combined, runlist. Note, both
+ * runlists @dst and @src are deallocated before returning so you cannot use
+ * the pointers for anything any more. (Strictly speaking the returned runlist
+ * may be the same as @dst but this is irrelevant.)
+ *
+ * On error, return -errno. Both runlists are left unmodified. The following
+ * error codes are defined:
+ *	-ENOMEM	- Not enough memory to allocate runlist array.
+ *	-EINVAL	- Invalid parameters were passed in.
+ */
+static inline runlist_element *ntfs_rl_split(runlist_element *dst, int dsize,
+		runlist_element *src, int ssize, int loc)
+{
+	BUG_ON(!dst);
+	BUG_ON(!src);
+
+	/* Space required: @dst size + @src size + one new hole. */
+	dst = ntfs_rl_realloc(dst, dsize, dsize + ssize + 1);
+	if (IS_ERR(dst))
+		return dst;
+	/*
+	 * We are guaranteed to succeed from here so can start modifying the
+	 * original runlists.
+	 */
+
+	/* Move the tail of @dst out of the way, then copy in @src. */
+	ntfs_rl_mm(dst, loc + 1 + ssize, loc, dsize - loc);
+	ntfs_rl_mc(dst, loc + 1, src, 0, ssize);
+
+	/* Adjust the size of the holes either size of @src. */
+	dst[loc].length		= dst[loc+1].vcn       - dst[loc].vcn;
+	dst[loc+ssize+1].vcn    = dst[loc+ssize].vcn   + dst[loc+ssize].length;
+	dst[loc+ssize+1].length = dst[loc+ssize+2].vcn - dst[loc+ssize+1].vcn;
+
+	return dst;
+}
+
+/**
+ * ntfs_merge_runlists - merge two runlists into one
+ * @drl:	original runlist to be worked on
+ * @srl:	new runlist to be merged into @drl
+ *
+ * First we sanity check the two runlists @srl and @drl to make sure that they
+ * are sensible and can be merged. The runlist @srl must be either after the
+ * runlist @drl or completely within a hole (or unmapped region) in @drl.
+ *
+ * It is up to the caller to serialize access to the runlists @drl and @srl.
+ *
+ * Merging of runlists is necessary in two cases:
+ *   1. When attribute lists are used and a further extent is being mapped.
+ *   2. When new clusters are allocated to fill a hole or extend a file.
+ *
+ * There are four possible ways @srl can be merged. It can:
+ *	- be inserted at the beginning of a hole,
+ *	- split the hole in two and be inserted between the two fragments,
+ *	- be appended at the end of a hole, or it can
+ *	- replace the whole hole.
+ * It can also be appended to the end of the runlist, which is just a variant
+ * of the insert case.
+ *
+ * On success, return a pointer to the new, combined, runlist. Note, both
+ * runlists @drl and @srl are deallocated before returning so you cannot use
+ * the pointers for anything any more. (Strictly speaking the returned runlist
+ * may be the same as @dst but this is irrelevant.)
+ *
+ * On error, return -errno. Both runlists are left unmodified. The following
+ * error codes are defined:
+ *	-ENOMEM	- Not enough memory to allocate runlist array.
+ *	-EINVAL	- Invalid parameters were passed in.
+ *	-ERANGE	- The runlists overlap and cannot be merged.
+ */
+runlist_element *ntfs_merge_runlists(runlist_element *drl,
+		runlist_element *srl)
+{
+	int di, si;		/* Current index into @[ds]rl. */
+	int sstart;		/* First index with lcn > LCN_RL_NOT_MAPPED. */
+	int dins;		/* Index into @drl at which to insert @srl. */
+	int dend, send;		/* Last index into @[ds]rl. */
+	int dfinal, sfinal;	/* The last index into @[ds]rl with
+				   lcn >= LCN_HOLE. */
+	int marker = 0;
+	VCN marker_vcn = 0;
+
+#ifdef DEBUG
+	ntfs_debug("dst:");
+	ntfs_debug_dump_runlist(drl);
+	ntfs_debug("src:");
+	ntfs_debug_dump_runlist(srl);
+#endif
+
+	/* Check for silly calling... */
+	if (unlikely(!srl))
+		return drl;
+	if (IS_ERR(srl) || IS_ERR(drl))
+		return ERR_PTR(-EINVAL);
+
+	/* Check for the case where the first mapping is being done now. */
+	if (unlikely(!drl)) {
+		drl = srl;
+		/* Complete the source runlist if necessary. */
+		if (unlikely(drl[0].vcn)) {
+			/* Scan to the end of the source runlist. */
+			for (dend = 0; likely(drl[dend].length); dend++)
+				;
+			drl = ntfs_rl_realloc(drl, dend, dend + 1);
+			if (IS_ERR(drl))
+				return drl;
+			/* Insert start element at the front of the runlist. */
+			ntfs_rl_mm(drl, 1, 0, dend);
+			drl[0].vcn = 0;
+			drl[0].lcn = LCN_RL_NOT_MAPPED;
+			drl[0].length = drl[1].vcn;
+		}
+		goto finished;
+	}
+
+	si = di = 0;
+
+	/* Skip any unmapped start element(s) in the source runlist. */
+	while (srl[si].length && srl[si].lcn < (LCN)LCN_HOLE)
+		si++;
+
+	/* Can't have an entirely unmapped source runlist. */
+	BUG_ON(!srl[si].length);
+
+	/* Record the starting points. */
+	sstart = si;
+
+	/*
+	 * Skip forward in @drl until we reach the position where @srl needs to
+	 * be inserted. If we reach the end of @drl, @srl just needs to be
+	 * appended to @drl.
+	 */
+	for (; drl[di].length; di++) {
+		if (drl[di].vcn + drl[di].length > srl[sstart].vcn)
+			break;
+	}
+	dins = di;
+
+	/* Sanity check for illegal overlaps. */
+	if ((drl[di].vcn == srl[si].vcn) && (drl[di].lcn >= 0) &&
+			(srl[si].lcn >= 0)) {
+		ntfs_error(NULL, "Run lists overlap. Cannot merge!");
+		return ERR_PTR(-ERANGE);
+	}
+
+	/* Scan to the end of both runlists in order to know their sizes. */
+	for (send = si; srl[send].length; send++)
+		;
+	for (dend = di; drl[dend].length; dend++)
+		;
+
+	if (srl[send].lcn == (LCN)LCN_ENOENT)
+		marker_vcn = srl[marker = send].vcn;
+
+	/* Scan to the last element with lcn >= LCN_HOLE. */
+	for (sfinal = send; sfinal >= 0 && srl[sfinal].lcn < LCN_HOLE; sfinal--)
+		;
+	for (dfinal = dend; dfinal >= 0 && drl[dfinal].lcn < LCN_HOLE; dfinal--)
+		;
+
+	{
+	BOOL start;
+	BOOL finish;
+	int ds = dend + 1;		/* Number of elements in drl & srl */
+	int ss = sfinal - sstart + 1;
+
+	start  = ((drl[dins].lcn <  LCN_RL_NOT_MAPPED) ||    /* End of file   */
+		  (drl[dins].vcn == srl[sstart].vcn));	     /* Start of hole */
+	finish = ((drl[dins].lcn >= LCN_RL_NOT_MAPPED) &&    /* End of file   */
+		 ((drl[dins].vcn + drl[dins].length) <=      /* End of hole   */
+		  (srl[send - 1].vcn + srl[send - 1].length)));
+
+	/* Or we'll lose an end marker */
+	if (start && finish && (drl[dins].length == 0))
+		ss++;
+	if (marker && (drl[dins].vcn + drl[dins].length > srl[send - 1].vcn))
+		finish = FALSE;
+#if 0
+	ntfs_debug("dfinal = %i, dend = %i", dfinal, dend);
+	ntfs_debug("sstart = %i, sfinal = %i, send = %i", sstart, sfinal, send);
+	ntfs_debug("start = %i, finish = %i", start, finish);
+	ntfs_debug("ds = %i, ss = %i, dins = %i", ds, ss, dins);
+#endif
+	if (start) {
+		if (finish)
+			drl = ntfs_rl_replace(drl, ds, srl + sstart, ss, dins);
+		else
+			drl = ntfs_rl_insert(drl, ds, srl + sstart, ss, dins);
+	} else {
+		if (finish)
+			drl = ntfs_rl_append(drl, ds, srl + sstart, ss, dins);
+		else
+			drl = ntfs_rl_split(drl, ds, srl + sstart, ss, dins);
+	}
+	if (IS_ERR(drl)) {
+		ntfs_error(NULL, "Merge failed.");
+		return drl;
+	}
+	ntfs_free(srl);
+	if (marker) {
+		ntfs_debug("Triggering marker code.");
+		for (ds = dend; drl[ds].length; ds++)
+			;
+		/* We only need to care if @srl ended after @drl. */
+		if (drl[ds].vcn <= marker_vcn) {
+			int slots = 0;
+
+			if (drl[ds].vcn == marker_vcn) {
+				ntfs_debug("Old marker = 0x%llx, replacing "
+						"with LCN_ENOENT.",
+						(unsigned long long)
+						drl[ds].lcn);
+				drl[ds].lcn = (LCN)LCN_ENOENT;
+				goto finished;
+			}
+			/*
+			 * We need to create an unmapped runlist element in
+			 * @drl or extend an existing one before adding the
+			 * ENOENT terminator.
+			 */
+			if (drl[ds].lcn == (LCN)LCN_ENOENT) {
+				ds--;
+				slots = 1;
+			}
+			if (drl[ds].lcn != (LCN)LCN_RL_NOT_MAPPED) {
+				/* Add an unmapped runlist element. */
+				if (!slots) {
+					/* FIXME/TODO: We need to have the
+					 * extra memory already! (AIA) */
+					drl = ntfs_rl_realloc(drl, ds, ds + 2);
+					if (!drl)
+						goto critical_error;
+					slots = 2;
+				}
+				ds++;
+				/* Need to set vcn if it isn't set already. */
+				if (slots != 1)
+					drl[ds].vcn = drl[ds - 1].vcn +
+							drl[ds - 1].length;
+				drl[ds].lcn = (LCN)LCN_RL_NOT_MAPPED;
+				/* We now used up a slot. */
+				slots--;
+			}
+			drl[ds].length = marker_vcn - drl[ds].vcn;
+			/* Finally add the ENOENT terminator. */
+			ds++;
+			if (!slots) {
+				/* FIXME/TODO: We need to have the extra
+				 * memory already! (AIA) */
+				drl = ntfs_rl_realloc(drl, ds, ds + 1);
+				if (!drl)
+					goto critical_error;
+			}
+			drl[ds].vcn = marker_vcn;
+			drl[ds].lcn = (LCN)LCN_ENOENT;
+			drl[ds].length = (s64)0;
+		}
+	}
+	}
+
+finished:
+	/* The merge was completed successfully. */
+	ntfs_debug("Merged runlist:");
+	ntfs_debug_dump_runlist(drl);
+	return drl;
+
+critical_error:
+	/* Critical error! We cannot afford to fail here. */
+	ntfs_error(NULL, "Critical error! Not enough memory.");
+	panic("NTFS: Cannot continue.");
+}
+
+/**
+ * decompress_mapping_pairs - convert mapping pairs array to runlist
+ * @vol:	ntfs volume on which the attribute resides
+ * @attr:	attribute record whose mapping pairs array to decompress
+ * @old_rl:	optional runlist in which to insert @attr's runlist
+ *
+ * It is up to the caller to serialize access to the runlist @old_rl.
+ *
+ * Decompress the attribute @attr's mapping pairs array into a runlist. On
+ * success, return the decompressed runlist.
+ *
+ * If @old_rl is not NULL, decompressed runlist is inserted into the
+ * appropriate place in @old_rl and the resultant, combined runlist is
+ * returned. The original @old_rl is deallocated.
+ *
+ * On error, return -errno. @old_rl is left unmodified in that case.
+ *
+ * The following error codes are defined:
+ *	-ENOMEM	- Not enough memory to allocate runlist array.
+ *	-EIO	- Corrupt runlist.
+ *	-EINVAL	- Invalid parameters were passed in.
+ *	-ERANGE	- The two runlists overlap.
+ *
+ * FIXME: For now we take the conceptionally simplest approach of creating the
+ * new runlist disregarding the already existing one and then splicing the
+ * two into one, if that is possible (we check for overlap and discard the new
+ * runlist if overlap present before returning ERR_PTR(-ERANGE)).
+ */
+runlist_element *decompress_mapping_pairs(const ntfs_volume *vol,
+		const ATTR_RECORD *attr, runlist_element *old_rl)
+{
+	VCN vcn;		/* Current vcn. */
+	LCN lcn;		/* Current lcn. */
+	s64 deltaxcn;		/* Change in [vl]cn. */
+	runlist_element *rl;	/* The output runlist. */
+	u8 *buf;		/* Current position in mapping pairs array. */
+	u8 *attr_end;		/* End of attribute. */
+	int rlsize;		/* Size of runlist buffer. */
+	u16 rlpos;		/* Current runlist position in units of
+				   runlist_elements. */
+	u8 b;			/* Current byte offset in buf. */
+
+#ifdef DEBUG
+	/* Make sure attr exists and is non-resident. */
+	if (!attr || !attr->non_resident || sle64_to_cpu(
+			attr->data.non_resident.lowest_vcn) < (VCN)0) {
+		ntfs_error(vol->sb, "Invalid arguments.");
+		return ERR_PTR(-EINVAL);
+	}
+#endif
+	/* Start at vcn = lowest_vcn and lcn 0. */
+	vcn = sle64_to_cpu(attr->data.non_resident.lowest_vcn);
+	lcn = 0;
+	/* Get start of the mapping pairs array. */
+	buf = (u8*)attr + le16_to_cpu(
+			attr->data.non_resident.mapping_pairs_offset);
+	attr_end = (u8*)attr + le32_to_cpu(attr->length);
+	if (unlikely(buf < (u8*)attr || buf > attr_end)) {
+		ntfs_error(vol->sb, "Corrupt attribute.");
+		return ERR_PTR(-EIO);
+	}
+	/* Current position in runlist array. */
+	rlpos = 0;
+	/* Allocate first page and set current runlist size to one page. */
+	rl = ntfs_malloc_nofs(rlsize = PAGE_SIZE);
+	if (unlikely(!rl))
+		return ERR_PTR(-ENOMEM);
+	/* Insert unmapped starting element if necessary. */
+	if (vcn) {
+		rl->vcn = (VCN)0;
+		rl->lcn = (LCN)LCN_RL_NOT_MAPPED;
+		rl->length = vcn;
+		rlpos++;
+	}
+	while (buf < attr_end && *buf) {
+		/*
+		 * Allocate more memory if needed, including space for the
+		 * not-mapped and terminator elements. ntfs_malloc_nofs()
+		 * operates on whole pages only.
+		 */
+		if (((rlpos + 3) * sizeof(*old_rl)) > rlsize) {
+			runlist_element *rl2;
+
+			rl2 = ntfs_malloc_nofs(rlsize + (int)PAGE_SIZE);
+			if (unlikely(!rl2)) {
+				ntfs_free(rl);
+				return ERR_PTR(-ENOMEM);
+			}
+			memcpy(rl2, rl, rlsize);
+			ntfs_free(rl);
+			rl = rl2;
+			rlsize += PAGE_SIZE;
+		}
+		/* Enter the current vcn into the current runlist element. */
+		rl[rlpos].vcn = vcn;
+		/*
+		 * Get the change in vcn, i.e. the run length in clusters.
+		 * Doing it this way ensures that we signextend negative values.
+		 * A negative run length doesn't make any sense, but hey, I
+		 * didn't make up the NTFS specs and Windows NT4 treats the run
+		 * length as a signed value so that's how it is...
+		 */
+		b = *buf & 0xf;
+		if (b) {
+			if (unlikely(buf + b > attr_end))
+				goto io_error;
+			for (deltaxcn = (s8)buf[b--]; b; b--)
+				deltaxcn = (deltaxcn << 8) + buf[b];
+		} else { /* The length entry is compulsory. */
+			ntfs_error(vol->sb, "Missing length entry in mapping "
+					"pairs array.");
+			deltaxcn = (s64)-1;
+		}
+		/*
+		 * Assume a negative length to indicate data corruption and
+		 * hence clean-up and return NULL.
+		 */
+		if (unlikely(deltaxcn < 0)) {
+			ntfs_error(vol->sb, "Invalid length in mapping pairs "
+					"array.");
+			goto err_out;
+		}
+		/*
+		 * Enter the current run length into the current runlist
+		 * element.
+		 */
+		rl[rlpos].length = deltaxcn;
+		/* Increment the current vcn by the current run length. */
+		vcn += deltaxcn;
+		/*
+		 * There might be no lcn change at all, as is the case for
+		 * sparse clusters on NTFS 3.0+, in which case we set the lcn
+		 * to LCN_HOLE.
+		 */
+		if (!(*buf & 0xf0))
+			rl[rlpos].lcn = (LCN)LCN_HOLE;
+		else {
+			/* Get the lcn change which really can be negative. */
+			u8 b2 = *buf & 0xf;
+			b = b2 + ((*buf >> 4) & 0xf);
+			if (buf + b > attr_end)
+				goto io_error;
+			for (deltaxcn = (s8)buf[b--]; b > b2; b--)
+				deltaxcn = (deltaxcn << 8) + buf[b];
+			/* Change the current lcn to its new value. */
+			lcn += deltaxcn;
+#ifdef DEBUG
+			/*
+			 * On NTFS 1.2-, apparently can have lcn == -1 to
+			 * indicate a hole. But we haven't verified ourselves
+			 * whether it is really the lcn or the deltaxcn that is
+			 * -1. So if either is found give us a message so we
+			 * can investigate it further!
+			 */
+			if (vol->major_ver < 3) {
+				if (unlikely(deltaxcn == (LCN)-1))
+					ntfs_error(vol->sb, "lcn delta == -1");
+				if (unlikely(lcn == (LCN)-1))
+					ntfs_error(vol->sb, "lcn == -1");
+			}
+#endif
+			/* Check lcn is not below -1. */
+			if (unlikely(lcn < (LCN)-1)) {
+				ntfs_error(vol->sb, "Invalid LCN < -1 in "
+						"mapping pairs array.");
+				goto err_out;
+			}
+			/* Enter the current lcn into the runlist element. */
+			rl[rlpos].lcn = lcn;
+		}
+		/* Get to the next runlist element. */
+		rlpos++;
+		/* Increment the buffer position to the next mapping pair. */
+		buf += (*buf & 0xf) + ((*buf >> 4) & 0xf) + 1;
+	}
+	if (unlikely(buf >= attr_end))
+		goto io_error;
+	/*
+	 * If there is a highest_vcn specified, it must be equal to the final
+	 * vcn in the runlist - 1, or something has gone badly wrong.
+	 */
+	deltaxcn = sle64_to_cpu(attr->data.non_resident.highest_vcn);
+	if (unlikely(deltaxcn && vcn - 1 != deltaxcn)) {
+mpa_err:
+		ntfs_error(vol->sb, "Corrupt mapping pairs array in "
+				"non-resident attribute.");
+		goto err_out;
+	}
+	/* Setup not mapped runlist element if this is the base extent. */
+	if (!attr->data.non_resident.lowest_vcn) {
+		VCN max_cluster;
+
+		max_cluster = (sle64_to_cpu(
+				attr->data.non_resident.allocated_size) +
+				vol->cluster_size - 1) >>
+				vol->cluster_size_bits;
+		/*
+		 * If there is a difference between the highest_vcn and the
+		 * highest cluster, the runlist is either corrupt or, more
+		 * likely, there are more extents following this one.
+		 */
+		if (deltaxcn < --max_cluster) {
+			ntfs_debug("More extents to follow; deltaxcn = 0x%llx, "
+					"max_cluster = 0x%llx",
+					(unsigned long long)deltaxcn,
+					(unsigned long long)max_cluster);
+			rl[rlpos].vcn = vcn;
+			vcn += rl[rlpos].length = max_cluster - deltaxcn;
+			rl[rlpos].lcn = (LCN)LCN_RL_NOT_MAPPED;
+			rlpos++;
+		} else if (unlikely(deltaxcn > max_cluster)) {
+			ntfs_error(vol->sb, "Corrupt attribute. deltaxcn = "
+					"0x%llx, max_cluster = 0x%llx",
+					(unsigned long long)deltaxcn,
+					(unsigned long long)max_cluster);
+			goto mpa_err;
+		}
+		rl[rlpos].lcn = (LCN)LCN_ENOENT;
+	} else /* Not the base extent. There may be more extents to follow. */
+		rl[rlpos].lcn = (LCN)LCN_RL_NOT_MAPPED;
+
+	/* Setup terminating runlist element. */
+	rl[rlpos].vcn = vcn;
+	rl[rlpos].length = (s64)0;
+	/* If no existing runlist was specified, we are done. */
+	if (!old_rl) {
+		ntfs_debug("Mapping pairs array successfully decompressed:");
+		ntfs_debug_dump_runlist(rl);
+		return rl;
+	}
+	/* Now combine the new and old runlists checking for overlaps. */
+	old_rl = ntfs_merge_runlists(old_rl, rl);
+	if (likely(!IS_ERR(old_rl)))
+		return old_rl;
+	ntfs_free(rl);
+	ntfs_error(vol->sb, "Failed to merge runlists.");
+	return old_rl;
+io_error:
+	ntfs_error(vol->sb, "Corrupt attribute.");
+err_out:
+	ntfs_free(rl);
+	return ERR_PTR(-EIO);
+}
+
+/**
+ * ntfs_vcn_to_lcn - convert a vcn into a lcn given a runlist
+ * @rl:		runlist to use for conversion
+ * @vcn:	vcn to convert
+ *
+ * Convert the virtual cluster number @vcn of an attribute into a logical
+ * cluster number (lcn) of a device using the runlist @rl to map vcns to their
+ * corresponding lcns.
+ *
+ * It is up to the caller to serialize access to the runlist @rl.
+ *
+ * Since lcns must be >= 0, we use negative return values with special meaning:
+ *
+ * Return value			Meaning / Description
+ * ==================================================
+ *  -1 = LCN_HOLE		Hole / not allocated on disk.
+ *  -2 = LCN_RL_NOT_MAPPED	This is part of the runlist which has not been
+ *				inserted into the runlist yet.
+ *  -3 = LCN_ENOENT		There is no such vcn in the attribute.
+ *
+ * Locking: - The caller must have locked the runlist (for reading or writing).
+ *	    - This function does not touch the lock.
+ */
+LCN ntfs_vcn_to_lcn(const runlist_element *rl, const VCN vcn)
+{
+	int i;
+
+	BUG_ON(vcn < 0);
+	/*
+	 * If rl is NULL, assume that we have found an unmapped runlist. The
+	 * caller can then attempt to map it and fail appropriately if
+	 * necessary.
+	 */
+	if (unlikely(!rl))
+		return (LCN)LCN_RL_NOT_MAPPED;
+
+	/* Catch out of lower bounds vcn. */
+	if (unlikely(vcn < rl[0].vcn))
+		return (LCN)LCN_ENOENT;
+
+	for (i = 0; likely(rl[i].length); i++) {
+		if (unlikely(vcn < rl[i+1].vcn)) {
+			if (likely(rl[i].lcn >= (LCN)0))
+				return rl[i].lcn + (vcn - rl[i].vcn);
+			return rl[i].lcn;
+		}
+	}
+	/*
+	 * The terminator element is setup to the correct value, i.e. one of
+	 * LCN_HOLE, LCN_RL_NOT_MAPPED, or LCN_ENOENT.
+	 */
+	if (likely(rl[i].lcn < (LCN)0))
+		return rl[i].lcn;
+	/* Just in case... We could replace this with BUG() some day. */
+	return (LCN)LCN_ENOENT;
+}
diff -Nru a/fs/ntfs/runlist.h b/fs/ntfs/runlist.h
--- /dev/null	Wed Dec 31 16:00:00 196900
+++ b/fs/ntfs/runlist.h	2004-10-19 10:13:11 +01:00
@@ -0,0 +1,48 @@
+/*
+ * runlist.h - Defines for runlist handling in NTFS Linux kernel driver.
+ *	       Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2004 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ *
+ * This program/include file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as published
+ * by the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program/include file is distributed in the hope that it will be
+ * useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program (in the main directory of the Linux-NTFS
+ * distribution in the file COPYING); if not, write to the Free Software
+ * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef _LINUX_NTFS_RUNLIST_H
+#define _LINUX_NTFS_RUNLIST_H
+
+#include "types.h"
+#include "layout.h"
+#include "volume.h"
+
+static inline void init_runlist(runlist *rl)
+{
+	rl->rl = NULL;
+	init_rwsem(&rl->lock);
+}
+
+typedef enum {
+	LCN_HOLE		= -1,	/* Keep this as highest value or die! */
+	LCN_RL_NOT_MAPPED	= -2,
+	LCN_ENOENT		= -3,
+} LCN_SPECIAL_VALUES;
+
+extern runlist_element *decompress_mapping_pairs(const ntfs_volume *vol,
+		const ATTR_RECORD *attr, runlist_element *old_rl);
+
+extern LCN ntfs_vcn_to_lcn(const runlist_element *rl, const VCN vcn);
+
+#endif /* _LINUX_NTFS_RUNLIST_H */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 3/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:39   ` [PATCH 2/37] " Anton Altaparmakov
@ 2004-10-19  9:39     ` Anton Altaparmakov
  2004-10-19  9:39       ` [PATCH 4/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 3/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/09/30 1.2000.1.2)
   NTFS: Add vol->mft_data_pos and initialize it at mount time.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:13:15 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:13:15 +01:00
@@ -26,6 +26,7 @@
 	- Implement extent mft record deallocation
 	  fs/ntfs/mft.c::ntfs_extent_mft_record_free().
 	- Splitt runlist related functions off from attrib.[hc] to runlist.[hc].
+	- Add vol->mft_data_pos and initialize it at mount time.
 
 2.1.19 - Many cleanups, improvements, and a minor bug fix.
 
diff -Nru a/fs/ntfs/super.c b/fs/ntfs/super.c
--- a/fs/ntfs/super.c	2004-10-19 10:13:15 +01:00
+++ b/fs/ntfs/super.c	2004-10-19 10:13:15 +01:00
@@ -827,12 +827,12 @@
 }
 
 /**
- * setup_lcn_allocator - initialize the cluster allocator
- * @vol:	volume structure for which to setup the lcn allocator
+ * ntfs_setup_allocators - initialize the cluster and mft allocators
+ * @vol:	volume structure for which to setup the allocators
  *
- * Setup the cluster (lcn) allocator to the starting values.
+ * Setup the cluster (lcn) and mft allocators to the starting values.
  */
-static void setup_lcn_allocator(ntfs_volume *vol)
+static void ntfs_setup_allocators(ntfs_volume *vol)
 {
 #ifdef NTFS_RW
 	LCN mft_zone_size, mft_lcn;
@@ -902,6 +902,11 @@
 	vol->data2_zone_pos = 0;
 	ntfs_debug("vol->data2_zone_pos = 0x%llx",
 			(unsigned long long)vol->data2_zone_pos);
+
+	/* Set the mft data allocation position to mft record 24. */
+	vol->mft_data_pos = 24;
+	ntfs_debug("vol->mft_data_pos = 0x%llx",
+			(unsigned long long)vol->mft_data_pos);
 #endif /* NTFS_RW */
 }
 
@@ -2334,8 +2339,8 @@
 	 */
 	result = parse_ntfs_boot_sector(vol, (NTFS_BOOT_SECTOR*)bh->b_data);
 
-	/* Initialize the cluster allocator. */
-	setup_lcn_allocator(vol);
+	/* Initialize the cluster and mft allocators. */
+	ntfs_setup_allocators(vol);
 
 	brelse(bh);
 
diff -Nru a/fs/ntfs/volume.h b/fs/ntfs/volume.h
--- a/fs/ntfs/volume.h	2004-10-19 10:13:15 +01:00
+++ b/fs/ntfs/volume.h	2004-10-19 10:13:15 +01:00
@@ -81,6 +81,8 @@
 
 #ifdef NTFS_RW
 	/* Variables used by the cluster and mft allocators. */
+	s64 mft_data_pos;		/* Mft record number at which to
+					   allocate the next mft record. */
 	LCN mft_zone_start;		/* First cluster of the mft zone. */
 	LCN mft_zone_end;		/* First cluster beyond the mft zone. */
 	LCN mft_zone_pos;		/* Current position in the mft zone. */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 4/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:39     ` [PATCH 3/37] " Anton Altaparmakov
@ 2004-10-19  9:39       ` Anton Altaparmakov
  2004-10-19  9:40         ` [PATCH 5/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 4/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/09/30 1.2000.1.3)
   NTFS: Rename init_runlist() to ntfs_init_runlist(), ntfs_vcn_to_lcn() to
         ntfs_rl_vcn_to_lcn(), decompress_mapping_pairs() to
         ntfs_mapping_pairs_decompress() and adapt all callers.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:13:19 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:13:19 +01:00
@@ -27,6 +27,9 @@
 	  fs/ntfs/mft.c::ntfs_extent_mft_record_free().
 	- Splitt runlist related functions off from attrib.[hc] to runlist.[hc].
 	- Add vol->mft_data_pos and initialize it at mount time.
+	- Rename init_runlist() to ntfs_init_runlist(), ntfs_vcn_to_lcn() to
+	  ntfs_rl_vcn_to_lcn(), decompress_mapping_pairs() to
+	  ntfs_mapping_pairs_decompress() and adapt all callers.
 
 2.1.19 - Many cleanups, improvements, and a minor bug fix.
 
diff -Nru a/fs/ntfs/aops.c b/fs/ntfs/aops.c
--- a/fs/ntfs/aops.c	2004-10-19 10:13:19 +01:00
+++ b/fs/ntfs/aops.c	2004-10-19 10:13:19 +01:00
@@ -232,7 +232,7 @@
 				/* Seek to element containing target vcn. */
 				while (rl->length && rl[1].vcn <= vcn)
 					rl++;
-				lcn = ntfs_vcn_to_lcn(rl, vcn);
+				lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
 			} else
 				lcn = (LCN)LCN_RL_NOT_MAPPED;
 			/* Successful remap. */
@@ -266,7 +266,7 @@
 			}
 			/* Hard error, zero out region. */
 			SetPageError(page);
-			ntfs_error(vol->sb, "ntfs_vcn_to_lcn(vcn = 0x%llx) "
+			ntfs_error(vol->sb, "ntfs_rl_vcn_to_lcn(vcn = 0x%llx) "
 					"failed with error code 0x%llx%s.",
 					(unsigned long long)vcn,
 					(unsigned long long)-lcn,
@@ -274,9 +274,9 @@
 			// FIXME: Depending on vol->on_errors, do something.
 		}
 		/*
-		 * Either iblock was outside lblock limits or ntfs_vcn_to_lcn()
-		 * returned error. Just zero that portion of the page and set
-		 * the buffer uptodate.
+		 * Either iblock was outside lblock limits or
+		 * ntfs_rl_vcn_to_lcn() returned error.  Just zero that portion
+		 * of the page and set the buffer uptodate.
 		 */
 handle_hole:
 		bh->b_blocknr = -1UL;
@@ -637,7 +637,7 @@
 			/* Seek to element containing target vcn. */
 			while (rl->length && rl[1].vcn <= vcn)
 				rl++;
-			lcn = ntfs_vcn_to_lcn(rl, vcn);
+			lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
 		} else
 			lcn = (LCN)LCN_RL_NOT_MAPPED;
 		/* Successful remap. */
@@ -673,7 +673,7 @@
 		}
 		/* Failed to map the buffer, even after retrying. */
 		bh->b_blocknr = -1UL;
-		ntfs_error(vol->sb, "ntfs_vcn_to_lcn(vcn = 0x%llx) failed "
+		ntfs_error(vol->sb, "ntfs_rl_vcn_to_lcn(vcn = 0x%llx) failed "
 				"with error code 0x%llx%s.",
 				(unsigned long long)vcn,
 				(unsigned long long)-lcn,
@@ -1402,7 +1402,7 @@
 				/* Seek to element containing target vcn. */
 				while (rl->length && rl[1].vcn <= vcn)
 					rl++;
-				lcn = ntfs_vcn_to_lcn(rl, vcn);
+				lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
 			} else
 				lcn = (LCN)LCN_RL_NOT_MAPPED;
 			if (unlikely(lcn < 0)) {
@@ -1451,7 +1451,7 @@
 				 * retrying.
 				 */
 				bh->b_blocknr = -1UL;
-				ntfs_error(vol->sb, "ntfs_vcn_to_lcn(vcn = "
+				ntfs_error(vol->sb, "ntfs_rl_vcn_to_lcn(vcn = "
 						"0x%llx) failed with error "
 						"code 0x%llx%s.",
 						(unsigned long long)vcn,
diff -Nru a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c
--- a/fs/ntfs/attrib.c	2004-10-19 10:13:19 +01:00
+++ b/fs/ntfs/attrib.c	2004-10-19 10:13:19 +01:00
@@ -22,7 +22,6 @@
 
 #include <linux/buffer_head.h>
 #include "ntfs.h"
-//#include "dir.h"
 
 /**
  * ntfs_map_runlist - map (a part of) a runlist of an ntfs inode
@@ -66,10 +65,11 @@
 
 	down_write(&ni->runlist.lock);
 	/* Make sure someone else didn't do the work while we were sleeping. */
-	if (likely(ntfs_vcn_to_lcn(ni->runlist.rl, vcn) <= LCN_RL_NOT_MAPPED)) {
+	if (likely(ntfs_rl_vcn_to_lcn(ni->runlist.rl, vcn) <=
+			LCN_RL_NOT_MAPPED)) {
 		runlist_element *rl;
 
-		rl = decompress_mapping_pairs(ni->vol, ctx->attr,
+		rl = ntfs_mapping_pairs_decompress(ni->vol, ctx->attr,
 				ni->runlist.rl);
 		if (IS_ERR(rl))
 			err = PTR_ERR(rl);
@@ -398,14 +398,14 @@
 	rl = runlist->rl;
 	/* Read all clusters specified by the runlist one run at a time. */
 	while (rl->length) {
-		lcn = ntfs_vcn_to_lcn(rl, rl->vcn);
+		lcn = ntfs_rl_vcn_to_lcn(rl, rl->vcn);
 		ntfs_debug("Reading vcn = 0x%llx, lcn = 0x%llx.",
 				(unsigned long long)rl->vcn,
 				(unsigned long long)lcn);
 		/* The attribute list cannot be sparse. */
 		if (lcn < 0) {
-			ntfs_error(sb, "ntfs_vcn_to_lcn() failed. Cannot read "
-					"attribute list.");
+			ntfs_error(sb, "ntfs_rl_vcn_to_lcn() failed.  Cannot "
+					"read attribute list.");
 			goto err_out;
 		}
 		block = lcn << vol->cluster_size_bits >> block_size_bits;
diff -Nru a/fs/ntfs/compress.c b/fs/ntfs/compress.c
--- a/fs/ntfs/compress.c	2004-10-19 10:13:19 +01:00
+++ b/fs/ntfs/compress.c	2004-10-19 10:13:19 +01:00
@@ -600,7 +600,7 @@
 			/* Seek to element containing target vcn. */
 			while (rl->length && rl[1].vcn <= vcn)
 				rl++;
-			lcn = ntfs_vcn_to_lcn(rl, vcn);
+			lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
 		} else
 			lcn = (LCN)LCN_RL_NOT_MAPPED;
 		ntfs_debug("Reading vcn = 0x%llx, lcn = 0x%llx.",
@@ -926,7 +926,7 @@
 
 rl_err:
 	up_read(&ni->runlist.lock);
-	ntfs_error(vol->sb, "ntfs_vcn_to_lcn() failed. Cannot read "
+	ntfs_error(vol->sb, "ntfs_rl_vcn_to_lcn() failed. Cannot read "
 			"compression block.");
 	goto err_out;
 
diff -Nru a/fs/ntfs/inode.c b/fs/ntfs/inode.c
--- a/fs/ntfs/inode.c	2004-10-19 10:13:19 +01:00
+++ b/fs/ntfs/inode.c	2004-10-19 10:13:19 +01:00
@@ -376,13 +376,13 @@
 	ni->seq_no = 0;
 	atomic_set(&ni->count, 1);
 	ni->vol = NTFS_SB(sb);
-	init_runlist(&ni->runlist);
+	ntfs_init_runlist(&ni->runlist);
 	init_MUTEX(&ni->mrec_lock);
 	ni->page = NULL;
 	ni->page_ofs = 0;
 	ni->attr_list_size = 0;
 	ni->attr_list = NULL;
-	init_runlist(&ni->attr_list_rl);
+	ntfs_init_runlist(&ni->attr_list_rl);
 	ni->itype.index.bmp_ino = NULL;
 	ni->itype.index.block_size = 0;
 	ni->itype.index.vcn_size = 0;
@@ -701,7 +701,7 @@
 			 * Setup the runlist. No need for locking as we have
 			 * exclusive access to the inode at this time.
 			 */
-			ni->attr_list_rl.rl = decompress_mapping_pairs(vol,
+			ni->attr_list_rl.rl = ntfs_mapping_pairs_decompress(vol,
 					ctx->attr, NULL);
 			if (IS_ERR(ni->attr_list_rl.rl)) {
 				err = PTR_ERR(ni->attr_list_rl.rl);
@@ -1672,7 +1672,7 @@
  *
  * We solve these problems by starting with the $DATA attribute before anything
  * else and iterating using ntfs_attr_lookup($DATA) over all extents.  As each
- * extent is found, we decompress_mapping_pairs() including the implied
+ * extent is found, we ntfs_mapping_pairs_decompress() including the implied
  * ntfs_merge_runlists().  Each step of the iteration necessarily provides
  * sufficient information for the next step to complete.
  *
@@ -1810,7 +1810,7 @@
 				goto put_err_out;
 			}
 			/* Setup the runlist. */
-			ni->attr_list_rl.rl = decompress_mapping_pairs(vol,
+			ni->attr_list_rl.rl = ntfs_mapping_pairs_decompress(vol,
 					ctx->attr, NULL);
 			if (IS_ERR(ni->attr_list_rl.rl)) {
 				err = PTR_ERR(ni->attr_list_rl.rl);
@@ -1942,11 +1942,11 @@
 		 * as we have exclusive access to the inode at this time and we
 		 * are a mount in progress task, too.
 		 */
-		nrl = decompress_mapping_pairs(vol, attr, ni->runlist.rl);
+		nrl = ntfs_mapping_pairs_decompress(vol, attr, ni->runlist.rl);
 		if (IS_ERR(nrl)) {
-			ntfs_error(sb, "decompress_mapping_pairs() failed with "
-					"error code %ld. $MFT is corrupt.",
-					PTR_ERR(nrl));
+			ntfs_error(sb, "ntfs_mapping_pairs_decompress() "
+					"failed with error code %ld.  $MFT is "
+					"corrupt.", PTR_ERR(nrl));
 			goto put_err_out;
 		}
 		ni->runlist.rl = nrl;
diff -Nru a/fs/ntfs/runlist.c b/fs/ntfs/runlist.c
--- a/fs/ntfs/runlist.c	2004-10-19 10:13:19 +01:00
+++ b/fs/ntfs/runlist.c	2004-10-19 10:13:19 +01:00
@@ -685,7 +685,7 @@
 }
 
 /**
- * decompress_mapping_pairs - convert mapping pairs array to runlist
+ * ntfs_mapping_pairs_decompress - convert mapping pairs array to runlist
  * @vol:	ntfs volume on which the attribute resides
  * @attr:	attribute record whose mapping pairs array to decompress
  * @old_rl:	optional runlist in which to insert @attr's runlist
@@ -712,7 +712,7 @@
  * two into one, if that is possible (we check for overlap and discard the new
  * runlist if overlap present before returning ERR_PTR(-ERANGE)).
  */
-runlist_element *decompress_mapping_pairs(const ntfs_volume *vol,
+runlist_element *ntfs_mapping_pairs_decompress(const ntfs_volume *vol,
 		const ATTR_RECORD *attr, runlist_element *old_rl)
 {
 	VCN vcn;		/* Current vcn. */
@@ -929,7 +929,7 @@
 }
 
 /**
- * ntfs_vcn_to_lcn - convert a vcn into a lcn given a runlist
+ * ntfs_rl_vcn_to_lcn - convert a vcn into a lcn given a runlist
  * @rl:		runlist to use for conversion
  * @vcn:	vcn to convert
  *
@@ -951,7 +951,7 @@
  * Locking: - The caller must have locked the runlist (for reading or writing).
  *	    - This function does not touch the lock.
  */
-LCN ntfs_vcn_to_lcn(const runlist_element *rl, const VCN vcn)
+LCN ntfs_rl_vcn_to_lcn(const runlist_element *rl, const VCN vcn)
 {
 	int i;
 
diff -Nru a/fs/ntfs/runlist.h b/fs/ntfs/runlist.h
--- a/fs/ntfs/runlist.h	2004-10-19 10:13:19 +01:00
+++ b/fs/ntfs/runlist.h	2004-10-19 10:13:19 +01:00
@@ -28,7 +28,7 @@
 #include "layout.h"
 #include "volume.h"
 
-static inline void init_runlist(runlist *rl)
+static inline void ntfs_init_runlist(runlist *rl)
 {
 	rl->rl = NULL;
 	init_rwsem(&rl->lock);
@@ -40,9 +40,9 @@
 	LCN_ENOENT		= -3,
 } LCN_SPECIAL_VALUES;
 
-extern runlist_element *decompress_mapping_pairs(const ntfs_volume *vol,
+extern runlist_element *ntfs_mapping_pairs_decompress(const ntfs_volume *vol,
 		const ATTR_RECORD *attr, runlist_element *old_rl);
 
-extern LCN ntfs_vcn_to_lcn(const runlist_element *rl, const VCN vcn);
+extern LCN ntfs_rl_vcn_to_lcn(const runlist_element *rl, const VCN vcn);
 
 #endif /* _LINUX_NTFS_RUNLIST_H */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 5/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:39       ` [PATCH 4/37] " Anton Altaparmakov
@ 2004-10-19  9:40         ` Anton Altaparmakov
  2004-10-19  9:40           ` [PATCH 6/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 5/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/09/30 1.2004)
   NTFS: Forgot to lock the mft bitmap when clearing the bit in
         ntfs_extent_mft_record_free().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:13:23 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:13:23 +01:00
@@ -1210,7 +1210,9 @@
 	ntfs_clear_extent_inode(ni);
 
 	/* Clear the bit in the $MFT/$BITMAP corresponding to this record. */
+	down_write(&vol->mftbmp_lock);
 	err = ntfs_bitmap_clear_bit(vol->mftbmp_ino, mft_no);
+	up_write(&vol->mftbmp_lock);
 	if (unlikely(err)) {
 		/*
 		 * The extent inode is gone but we failed to deallocate it in
diff -Nru a/fs/ntfs/mft.h b/fs/ntfs/mft.h
--- a/fs/ntfs/mft.h	2004-10-19 10:13:23 +01:00
+++ b/fs/ntfs/mft.h	2004-10-19 10:13:23 +01:00
@@ -27,8 +27,6 @@
 
 #include "inode.h"
 
-extern int format_mft_record(ntfs_inode *ni, MFT_RECORD *m);
-
 extern MFT_RECORD *map_mft_record(ntfs_inode *ni);
 extern void unmap_mft_record(ntfs_inode *ni);
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 6/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:40         ` [PATCH 5/37] " Anton Altaparmakov
@ 2004-10-19  9:40           ` Anton Altaparmakov
  2004-10-19  9:40             ` [PATCH 7/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 6/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/09/30 1.2005)
   NTFS: Add fs/ntfs/runlist.[hc]::ntfs_get_nr_significant_bytes(),
         ntfs_get_size_for_mapping_pairs(), ntfs_write_significant_bytes(),
         and ntfs_mapping_pairs_build(), adapted from libntfs.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:13:26 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:13:26 +01:00
@@ -30,6 +30,9 @@
 	- Rename init_runlist() to ntfs_init_runlist(), ntfs_vcn_to_lcn() to
 	  ntfs_rl_vcn_to_lcn(), decompress_mapping_pairs() to
 	  ntfs_mapping_pairs_decompress() and adapt all callers.
+	- Add fs/ntfs/runlist.[hc]::ntfs_get_nr_significant_bytes(),
+	  ntfs_get_size_for_mapping_pairs(), ntfs_write_significant_bytes(),
+	  and ntfs_mapping_pairs_build(), adapted from libntfs.
 
 2.1.19 - Many cleanups, improvements, and a minor bug fix.
 
diff -Nru a/fs/ntfs/runlist.c b/fs/ntfs/runlist.c
--- a/fs/ntfs/runlist.c	2004-10-19 10:13:26 +01:00
+++ b/fs/ntfs/runlist.c	2004-10-19 10:13:26 +01:00
@@ -984,3 +984,340 @@
 	/* Just in case... We could replace this with BUG() some day. */
 	return (LCN)LCN_ENOENT;
 }
+
+/**
+ * ntfs_get_nr_significant_bytes - get number of bytes needed to store a number
+ * @n:		number for which to get the number of bytes for
+ *
+ * Return the number of bytes required to store @n unambiguously as
+ * a signed number.
+ *
+ * This is used in the context of the mapping pairs array to determine how
+ * many bytes will be needed in the array to store a given logical cluster
+ * number (lcn) or a specific run length.
+ *
+ * Return the number of bytes written.  This function cannot fail.
+ */
+static inline int ntfs_get_nr_significant_bytes(const s64 n)
+{
+	s64 l = n;
+	int i;
+	s8 j;
+
+	i = 0;
+	do {
+		l >>= 8;
+		i++;
+	} while (l != 0 && l != -1);
+	j = (n >> 8 * (i - 1)) & 0xff;
+	/* If the sign bit is wrong, we need an extra byte. */
+	if ((n < 0 && j >= 0) || (n > 0 && j < 0))
+		i++;
+	return i;
+}
+
+/**
+ * ntfs_get_size_for_mapping_pairs - get bytes needed for mapping pairs array
+ * @vol:	ntfs volume (needed for the ntfs version)
+ * @rl:		locked runlist to determine the size of the mapping pairs of
+ * @start_vcn:	vcn at which to start the mapping pairs array
+ *
+ * Walk the locked runlist @rl and calculate the size in bytes of the mapping
+ * pairs array corresponding to the runlist @rl, starting at vcn @start_vcn.
+ * This for example allows us to allocate a buffer of the right size when
+ * building the mapping pairs array.
+ *
+ * If @rl is NULL, just return 1 (for the single terminator byte).
+ *
+ * Return the calculated size in bytes on success.  On error, return -errno.
+ * The following error codes are defined:
+ *	-EINVAL	- Run list contains unmapped elements.  Make sure to only pass
+ *		  fully mapped runlists to this function.
+ *	-EIO	- The runlist is corrupt.
+ *
+ * Locking: @rl must be locked on entry (either for reading or writing), it
+ *	    remains locked throughout, and is left locked upon return.
+ */
+int ntfs_get_size_for_mapping_pairs(const ntfs_volume *vol,
+		const runlist_element *rl, const VCN start_vcn)
+{
+	LCN prev_lcn;
+	int rls;
+
+	BUG_ON(start_vcn < 0);
+	if (!rl) {
+		BUG_ON(start_vcn);
+		return 1;
+	}
+	/* Skip to runlist element containing @start_vcn. */
+	while (rl->length && start_vcn >= rl[1].vcn)
+		rl++;
+	if ((!rl->length && start_vcn > rl->vcn) || start_vcn < rl->vcn)
+		return -EINVAL;
+	prev_lcn = 0;
+	/* Always need the termining zero byte. */
+	rls = 1;
+	/* Do the first partial run if present. */
+	if (start_vcn > rl->vcn) {
+		s64 delta;
+
+		/* We know rl->length != 0 already. */
+		if (rl->length < 0 || rl->lcn < LCN_HOLE)
+			goto err_out;
+		delta = start_vcn - rl->vcn;
+		/* Header byte + length. */
+		rls += 1 + ntfs_get_nr_significant_bytes(rl->length - delta);
+		/*
+		 * If the logical cluster number (lcn) denotes a hole and we
+		 * are on NTFS 3.0+, we don't store it at all, i.e. we need
+		 * zero space.  On earlier NTFS versions we just store the lcn.
+		 * Note: this assumes that on NTFS 1.2-, holes are stored with
+		 * an lcn of -1 and not a delta_lcn of -1 (unless both are -1).
+		 */
+		if (rl->lcn >= 0 || vol->major_ver < 3) {
+			prev_lcn = rl->lcn;
+			if (rl->lcn >= 0)
+				prev_lcn += delta;
+			/* Change in lcn. */
+			rls += ntfs_get_nr_significant_bytes(prev_lcn);
+		}
+		/* Go to next runlist element. */
+		rl++;
+	}
+	/* Do the full runs. */
+	for (; rl->length; rl++) {
+		if (rl->length < 0 || rl->lcn < LCN_HOLE)
+			goto err_out;
+		/* Header byte + length. */
+		rls += 1 + ntfs_get_nr_significant_bytes(rl->length);
+		/*
+		 * If the logical cluster number (lcn) denotes a hole and we
+		 * are on NTFS 3.0+, we don't store it at all, i.e. we need
+		 * zero space.  On earlier NTFS versions we just store the lcn.
+		 * Note: this assumes that on NTFS 1.2-, holes are stored with
+		 * an lcn of -1 and not a delta_lcn of -1 (unless both are -1).
+		 */
+		if (rl->lcn >= 0 || vol->major_ver < 3) {
+			/* Change in lcn. */
+			rls += ntfs_get_nr_significant_bytes(rl->lcn -
+					prev_lcn);
+			prev_lcn = rl->lcn;
+		}
+	}
+	return rls;
+err_out:
+	if (rl->lcn == LCN_RL_NOT_MAPPED)
+		rls = -EINVAL;
+	else
+		rls = -EIO;
+	return rls;
+}
+
+/**
+ * ntfs_write_significant_bytes - write the significant bytes of a number
+ * @dst:	destination buffer to write to
+ * @dst_max:	pointer to last byte of destination buffer for bounds checking
+ * @n:		number whose significant bytes to write
+ *
+ * Store in @dst, the minimum bytes of the number @n which are required to
+ * identify @n unambiguously as a signed number, taking care not to exceed
+ * @dest_max, the maximum position within @dst to which we are allowed to
+ * write.
+ *
+ * This is used when building the mapping pairs array of a runlist to compress
+ * a given logical cluster number (lcn) or a specific run length to the minumum
+ * size possible.
+ *
+ * Return the number of bytes written on success.  On error, i.e. the
+ * destination buffer @dst is too small, return -ENOSPC.
+ */
+static inline int ntfs_write_significant_bytes(s8 *dst, const s8 *dst_max,
+		const s64 n)
+{
+	s64 l = n;
+	int i;
+	s8 j;
+
+	i = 0;
+	do {
+		if (dst > dst_max)
+			goto err_out;
+		*dst++ = l & 0xffll;
+		l >>= 8;
+		i++;
+	} while (l != 0 && l != -1);
+	j = (n >> 8 * (i - 1)) & 0xff;
+	/* If the sign bit is wrong, we need an extra byte. */
+	if (n < 0 && j >= 0) {
+		if (dst > dst_max)
+			goto err_out;
+		i++;
+		*dst = (s8)-1;
+	} else if (n > 0 && j < 0) {
+		if (dst > dst_max)
+			goto err_out;
+		i++;
+		*dst = (s8)0;
+	}
+	return i;
+err_out:
+	return -ENOSPC;
+}
+
+/**
+ * ntfs_mapping_pairs_build - build the mapping pairs array from a runlist
+ * @vol:	ntfs volume (needed for the ntfs version)
+ * @dst:	destination buffer to which to write the mapping pairs array
+ * @dst_len:	size of destination buffer @dst in bytes
+ * @rl:		locked runlist for which to build the mapping pairs array
+ * @start_vcn:	vcn at which to start the mapping pairs array
+ * @stop_vcn:	first vcn outside destination buffer on success or -ENOSPC
+ *
+ * Create the mapping pairs array from the locked runlist @rl, starting at vcn
+ * @start_vcn and save the array in @dst.  @dst_len is the size of @dst in
+ * bytes and it should be at least equal to the value obtained by calling
+ * ntfs_get_size_for_mapping_pairs().
+ *
+ * If @rl is NULL, just write a single terminator byte to @dst.
+ *
+ * On success or -ENOSPC error, if @stop_vcn is not NULL, *@stop_vcn is set to
+ * the first vcn outside the destination buffer.  Note that on error, @dst has
+ * been filled with all the mapping pairs that will fit, thus it can be treated
+ * as partial success, in that a new attribute extent needs to be created or
+ * the next extent has to be used and the mapping pairs build has to be
+ * continued with @start_vcn set to *@stop_vcn.
+ *
+ * Return 0 on success and -errno on error.  The following error codes are
+ * defined:
+ *	-EINVAL	- Run list contains unmapped elements.  Make sure to only pass
+ *		  fully mapped runlists to this function.
+ *	-EIO	- The runlist is corrupt.
+ *	-ENOSPC	- The destination buffer is too small.
+ *
+ * Locking: @rl must be locked on entry (either for reading or writing), it
+ *	    remains locked throughout, and is left locked upon return.
+ */
+int ntfs_mapping_pairs_build(const ntfs_volume *vol, s8 *dst,
+		const int dst_len, const runlist_element *rl,
+		const VCN start_vcn, VCN *const stop_vcn)
+{
+	LCN prev_lcn;
+	s8 *dst_max, *dst_next;
+	int err = -ENOSPC;
+	s8 len_len, lcn_len;
+
+	BUG_ON(start_vcn < 0);
+	BUG_ON(dst_len < 1);
+	if (!rl) {
+		BUG_ON(start_vcn);
+		if (stop_vcn)
+			*stop_vcn = 0;
+		/* Terminator byte. */
+		*dst = 0;
+		return 0;
+	}
+	/* Skip to runlist element containing @start_vcn. */
+	while (rl->length && start_vcn >= rl[1].vcn)
+		rl++;
+	if ((!rl->length && start_vcn > rl->vcn) || start_vcn < rl->vcn)
+		return -EINVAL;
+	/*
+	 * @dst_max is used for bounds checking in
+	 * ntfs_write_significant_bytes().
+	 */
+	dst_max = dst + dst_len - 1;
+	prev_lcn = 0;
+	/* Do the first partial run if present. */
+	if (start_vcn > rl->vcn) {
+		s64 delta;
+
+		/* We know rl->length != 0 already. */
+		if (rl->length < 0 || rl->lcn < LCN_HOLE)
+			goto err_out;
+		delta = start_vcn - rl->vcn;
+		/* Write length. */
+		len_len = ntfs_write_significant_bytes(dst + 1, dst_max,
+				rl->length - delta);
+		if (len_len < 0)
+			goto size_err;
+		/*
+		 * If the logical cluster number (lcn) denotes a hole and we
+		 * are on NTFS 3.0+, we don't store it at all, i.e. we need
+		 * zero space.  On earlier NTFS versions we just write the lcn
+		 * change.  FIXME: Do we need to write the lcn change or just
+		 * the lcn in that case?  Not sure as I have never seen this
+		 * case on NT4. - We assume that we just need to write the lcn
+		 * change until someone tells us otherwise... (AIA)
+		 */
+		if (rl->lcn >= 0 || vol->major_ver < 3) {
+			prev_lcn = rl->lcn;
+			if (rl->lcn >= 0)
+				prev_lcn += delta;
+			/* Write change in lcn. */
+			lcn_len = ntfs_write_significant_bytes(dst + 1 +
+					len_len, dst_max, prev_lcn);
+			if (lcn_len < 0)
+				goto size_err;
+		} else
+			lcn_len = 0;
+		dst_next = dst + len_len + lcn_len + 1;
+		if (dst_next > dst_max)
+			goto size_err;
+		/* Update header byte. */
+		*dst = lcn_len << 4 | len_len;
+		/* Position at next mapping pairs array element. */
+		dst = dst_next;
+		/* Go to next runlist element. */
+		rl++;
+	}
+	/* Do the full runs. */
+	for (; rl->length; rl++) {
+		if (rl->length < 0 || rl->lcn < LCN_HOLE)
+			goto err_out;
+		/* Write length. */
+		len_len = ntfs_write_significant_bytes(dst + 1, dst_max,
+				rl->length);
+		if (len_len < 0)
+			goto size_err;
+		/*
+		 * If the logical cluster number (lcn) denotes a hole and we
+		 * are on NTFS 3.0+, we don't store it at all, i.e. we need
+		 * zero space.  On earlier NTFS versions we just write the lcn
+		 * change.  FIXME: Do we need to write the lcn change or just
+		 * the lcn in that case?  Not sure as I have never seen this
+		 * case on NT4. - We assume that we just need to write the lcn
+		 * change until someone tells us otherwise... (AIA)
+		 */
+		if (rl->lcn >= 0 || vol->major_ver < 3) {
+			/* Write change in lcn. */
+			lcn_len = ntfs_write_significant_bytes(dst + 1 +
+					len_len, dst_max, rl->lcn - prev_lcn);
+			if (lcn_len < 0)
+				goto size_err;
+			prev_lcn = rl->lcn;
+		} else
+			lcn_len = 0;
+		dst_next = dst + len_len + lcn_len + 1;
+		if (dst_next > dst_max)
+			goto size_err;
+		/* Update header byte. */
+		*dst = lcn_len << 4 | len_len;
+		/* Position at next mapping pairs array element. */
+		dst = dst_next;
+	}
+	/* Success. */
+	err = 0;
+size_err:
+	/* Set stop vcn. */
+	if (stop_vcn)
+		*stop_vcn = rl->vcn;
+	/* Add terminator byte. */
+	*dst = 0;
+	return err;
+err_out:
+	if (rl->lcn == LCN_RL_NOT_MAPPED)
+		err = -EINVAL;
+	else
+		err = -EIO;
+	return err;
+}
diff -Nru a/fs/ntfs/runlist.h b/fs/ntfs/runlist.h
--- a/fs/ntfs/runlist.h	2004-10-19 10:13:26 +01:00
+++ b/fs/ntfs/runlist.h	2004-10-19 10:13:26 +01:00
@@ -45,4 +45,11 @@
 
 extern LCN ntfs_rl_vcn_to_lcn(const runlist_element *rl, const VCN vcn);
 
+extern int ntfs_get_size_for_mapping_pairs(const ntfs_volume *vol,
+		const runlist_element *rl, const VCN start_vcn);
+
+extern int ntfs_mapping_pairs_build(const ntfs_volume *vol, s8 *dst,
+		const int dst_len, const runlist_element *rl,
+		const VCN start_vcn, VCN *const stop_vcn);
+
 #endif /* _LINUX_NTFS_RUNLIST_H */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 7/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:40           ` [PATCH 6/37] " Anton Altaparmakov
@ 2004-10-19  9:40             ` Anton Altaparmakov
  2004-10-19  9:40               ` [PATCH 8/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 7/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/09/30 1.2005.1.1)
   NTFS: Rename ntfs_merge_runlists() to ntfs_runlists_merge().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:13:30 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:13:30 +01:00
@@ -29,7 +29,8 @@
 	- Add vol->mft_data_pos and initialize it at mount time.
 	- Rename init_runlist() to ntfs_init_runlist(), ntfs_vcn_to_lcn() to
 	  ntfs_rl_vcn_to_lcn(), decompress_mapping_pairs() to
-	  ntfs_mapping_pairs_decompress() and adapt all callers.
+	  ntfs_mapping_pairs_decompress(), ntfs_merge_runlists() to
+	  ntfs_runlists_merge() and adapt all callers.
 	- Add fs/ntfs/runlist.[hc]::ntfs_get_nr_significant_bytes(),
 	  ntfs_get_size_for_mapping_pairs(), ntfs_write_significant_bytes(),
 	  and ntfs_mapping_pairs_build(), adapted from libntfs.
diff -Nru a/fs/ntfs/inode.c b/fs/ntfs/inode.c
--- a/fs/ntfs/inode.c	2004-10-19 10:13:30 +01:00
+++ b/fs/ntfs/inode.c	2004-10-19 10:13:30 +01:00
@@ -1673,7 +1673,7 @@
  * We solve these problems by starting with the $DATA attribute before anything
  * else and iterating using ntfs_attr_lookup($DATA) over all extents.  As each
  * extent is found, we ntfs_mapping_pairs_decompress() including the implied
- * ntfs_merge_runlists().  Each step of the iteration necessarily provides
+ * ntfs_runlists_merge().  Each step of the iteration necessarily provides
  * sufficient information for the next step to complete.
  *
  * This should work but there are two possible pit falls (see inline comments
diff -Nru a/fs/ntfs/runlist.c b/fs/ntfs/runlist.c
--- a/fs/ntfs/runlist.c	2004-10-19 10:13:30 +01:00
+++ b/fs/ntfs/runlist.c	2004-10-19 10:13:30 +01:00
@@ -449,7 +449,7 @@
 }
 
 /**
- * ntfs_merge_runlists - merge two runlists into one
+ * ntfs_runlists_merge - merge two runlists into one
  * @drl:	original runlist to be worked on
  * @srl:	new runlist to be merged into @drl
  *
@@ -482,7 +482,7 @@
  *	-EINVAL	- Invalid parameters were passed in.
  *	-ERANGE	- The runlists overlap and cannot be merged.
  */
-runlist_element *ntfs_merge_runlists(runlist_element *drl,
+runlist_element *ntfs_runlists_merge(runlist_element *drl,
 		runlist_element *srl)
 {
 	int di, si;		/* Current index into @[ds]rl. */
@@ -915,7 +915,7 @@
 		return rl;
 	}
 	/* Now combine the new and old runlists checking for overlaps. */
-	old_rl = ntfs_merge_runlists(old_rl, rl);
+	old_rl = ntfs_runlists_merge(old_rl, rl);
 	if (likely(!IS_ERR(old_rl)))
 		return old_rl;
 	ntfs_free(rl);
diff -Nru a/fs/ntfs/runlist.h b/fs/ntfs/runlist.h
--- a/fs/ntfs/runlist.h	2004-10-19 10:13:30 +01:00
+++ b/fs/ntfs/runlist.h	2004-10-19 10:13:30 +01:00
@@ -40,6 +40,9 @@
 	LCN_ENOENT		= -3,
 } LCN_SPECIAL_VALUES;
 
+extern runlist_element *ntfs_runlists_merge(runlist_element *drl,
+		runlist_element *srl);
+
 extern runlist_element *ntfs_mapping_pairs_decompress(const ntfs_volume *vol,
 		const ATTR_RECORD *attr, runlist_element *old_rl);
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 8/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:40             ` [PATCH 7/37] " Anton Altaparmakov
@ 2004-10-19  9:40               ` Anton Altaparmakov
  2004-10-19  9:41                 ` [PATCH 9/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 8/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/09/30 1.2005.1.2)
   NTFS: - Add fs/ntfs/lcnalloc.h::ntfs_cluster_free_from_rl() which is a static
           inline wrapper for ntfs_cluster_free_from_rl_nolock() which takes the
           cluster bitmap lock for the duration of the call.
         - Make fs/ntfs/lcnalloc.c::ntfs_cluster_free_from_rl_nolock() not
           static and add a declaration for it to lcnalloc.h.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:13:34 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:13:34 +01:00
@@ -34,6 +34,11 @@
 	- Add fs/ntfs/runlist.[hc]::ntfs_get_nr_significant_bytes(),
 	  ntfs_get_size_for_mapping_pairs(), ntfs_write_significant_bytes(),
 	  and ntfs_mapping_pairs_build(), adapted from libntfs.
+	- Make fs/ntfs/lcnalloc.c::ntfs_cluster_free_from_rl_nolock() not
+	  static, add a declaration for it to lcnalloc.h.
+	- Add fs/ntfs/lcnalloc.h::ntfs_cluster_free_from_rl() which is a static
+	  inline wrapper for ntfs_cluster_free_from_rl_nolock() which takes the
+	  cluster bitmap lock for the duration of the call.
 
 2.1.19 - Many cleanups, improvements, and a minor bug fix.
 
diff -Nru a/fs/ntfs/lcnalloc.c b/fs/ntfs/lcnalloc.c
--- a/fs/ntfs/lcnalloc.c	2004-10-19 10:13:34 +01:00
+++ b/fs/ntfs/lcnalloc.c	2004-10-19 10:13:34 +01:00
@@ -46,7 +46,7 @@
  * Locking: - The volume lcn bitmap must be locked for writing on entry and is
  *	      left locked on return.
  */
-static int ntfs_cluster_free_from_rl_nolock(ntfs_volume *vol,
+int ntfs_cluster_free_from_rl_nolock(ntfs_volume *vol,
 		const runlist_element *rl)
 {
 	struct inode *lcnbmp_vi = vol->lcnbmp_ino;
diff -Nru a/fs/ntfs/lcnalloc.h b/fs/ntfs/lcnalloc.h
--- a/fs/ntfs/lcnalloc.h	2004-10-19 10:13:34 +01:00
+++ b/fs/ntfs/lcnalloc.h	2004-10-19 10:13:34 +01:00
@@ -78,6 +78,34 @@
 	return __ntfs_cluster_free(vi, start_vcn, count, FALSE);
 }
 
+extern int ntfs_cluster_free_from_rl_nolock(ntfs_volume *vol,
+		const runlist_element *rl);
+
+/**
+ * ntfs_cluster_free_from_rl - free clusters from runlist
+ * @vol:	mounted ntfs volume on which to free the clusters
+ * @rl:		runlist describing the clusters to free
+ *
+ * Free all the clusters described by the runlist @rl on the volume @vol.  In
+ * the case of an error being returned, at least some of the clusters were not
+ * freed.
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Locking: This function takes the volume lcn bitmap lock for writing and
+ *	    modifies the bitmap contents.
+ */
+static inline int ntfs_cluster_free_from_rl(ntfs_volume *vol,
+		const runlist_element *rl)
+{
+	int ret;
+
+	down_write(&vol->lcnbmp_lock);
+	ret = ntfs_cluster_free_from_rl_nolock(vol, rl);
+	up_write(&vol->lcnbmp_lock);
+	return ret;
+}
+
 #endif /* NTFS_RW */
 
 #endif /* defined _LINUX_NTFS_LCNALLOC_H */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 9/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:40               ` [PATCH 8/37] " Anton Altaparmakov
@ 2004-10-19  9:41                 ` Anton Altaparmakov
  2004-10-19  9:41                   ` [PATCH 10/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 9/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/09/30 1.2005.1.3)
   NTFS: Add fs/ntfs/attrib.[hc]::ntfs_attr_record_resize().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:13:38 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:13:38 +01:00
@@ -35,10 +35,11 @@
 	  ntfs_get_size_for_mapping_pairs(), ntfs_write_significant_bytes(),
 	  and ntfs_mapping_pairs_build(), adapted from libntfs.
 	- Make fs/ntfs/lcnalloc.c::ntfs_cluster_free_from_rl_nolock() not
-	  static, add a declaration for it to lcnalloc.h.
+	  static and add a declaration for it to lcnalloc.h.
 	- Add fs/ntfs/lcnalloc.h::ntfs_cluster_free_from_rl() which is a static
 	  inline wrapper for ntfs_cluster_free_from_rl_nolock() which takes the
 	  cluster bitmap lock for the duration of the call.
+	- Add fs/ntfs/attrib.[hc]::ntfs_attr_record_resize().
 
 2.1.19 - Many cleanups, improvements, and a minor bug fix.
 
diff -Nru a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c
--- a/fs/ntfs/attrib.c	2004-10-19 10:13:38 +01:00
+++ b/fs/ntfs/attrib.c	2004-10-19 10:13:38 +01:00
@@ -942,3 +942,47 @@
 	kmem_cache_free(ntfs_attr_ctx_cache, ctx);
 	return;
 }
+
+/**
+ * ntfs_attr_record_resize - resize an attribute record
+ * @m:		mft record containing attribute record
+ * @a:		attribute record to resize
+ * @new_size:	new size in bytes to which to resize the attribute record @a
+ *
+ * Resize the attribute record @a, i.e. the resident part of the attribute, in
+ * the mft record @m to @new_size bytes.
+ *
+ * Return 0 on success and -errno on error.  The following error codes are
+ * defined:
+ *	-ENOSPC	- Not enough space in the mft record @m to perform the resize.
+ *
+ * Note: On error, no modifications have been performed whatsoever.
+ *
+ * Warning: If you make a record smaller without having copied all the data you
+ *	    are interested in the data may be overwritten.
+ */
+int ntfs_attr_record_resize(MFT_RECORD *m, ATTR_RECORD *a, u32 new_size)
+{
+	ntfs_debug("Entering for new_size %u.", new_size);
+	/* Align to 8 bytes if it is not already done. */
+	if (new_size & 7)
+		new_size = (new_size + 7) & ~7;
+	/* If the actual attribute length has changed, move things around. */
+	if (new_size != le32_to_cpu(a->length)) {
+		u32 new_muse = le32_to_cpu(m->bytes_in_use) -
+				le32_to_cpu(a->length) + new_size;
+		/* Not enough space in this mft record. */
+		if (new_muse > le32_to_cpu(m->bytes_allocated))
+			return -ENOSPC;
+		/* Move attributes following @a to their new location. */
+		memmove((u8*)a + new_size, (u8*)a + le32_to_cpu(a->length),
+				le32_to_cpu(m->bytes_in_use) - ((u8*)a -
+				(u8*)m) - le32_to_cpu(a->length));
+		/* Adjust @m to reflect the change in used space. */
+		m->bytes_in_use = cpu_to_le32(new_muse);
+		/* Adjust @a to reflect the new size. */
+		if (new_size >= offsetof(ATTR_REC, length) + sizeof(a->length))
+			a->length = cpu_to_le32(new_size);
+	}
+	return 0;
+}
diff -Nru a/fs/ntfs/attrib.h b/fs/ntfs/attrib.h
--- a/fs/ntfs/attrib.h	2004-10-19 10:13:38 +01:00
+++ b/fs/ntfs/attrib.h	2004-10-19 10:13:38 +01:00
@@ -84,4 +84,6 @@
 		MFT_RECORD *mrec);
 extern void ntfs_attr_put_search_ctx(ntfs_attr_search_ctx *ctx);
 
+extern int ntfs_attr_record_resize(MFT_RECORD *m, ATTR_RECORD *a, u32 new_size);
+
 #endif /* _LINUX_NTFS_ATTRIB_H */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 10/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:41                 ` [PATCH 9/37] " Anton Altaparmakov
@ 2004-10-19  9:41                   ` Anton Altaparmakov
  2004-10-19  9:41                     ` [PATCH 11/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 10/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/01 1.2011)
   NTFS: Implement the equivalent of memset() for an ntfs attribute in
         fs/ntfs/attrib.[hc]::ntfs_attr_set() and switch
         fs/ntfs/logfile.c::ntfs_empty_logfile() to using it.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:13:41 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:13:41 +01:00
@@ -40,6 +40,9 @@
 	  inline wrapper for ntfs_cluster_free_from_rl_nolock() which takes the
 	  cluster bitmap lock for the duration of the call.
 	- Add fs/ntfs/attrib.[hc]::ntfs_attr_record_resize().
+	- Implement the equivalent of memset() for an ntfs attribute in
+	  fs/ntfs/attrib.[hc]::ntfs_attr_set() and switch
+	  fs/ntfs/logfile.c::ntfs_empty_logfile() to using it.
 
 2.1.19 - Many cleanups, improvements, and a minor bug fix.
 
diff -Nru a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c
--- a/fs/ntfs/attrib.c	2004-10-19 10:13:41 +01:00
+++ b/fs/ntfs/attrib.c	2004-10-19 10:13:42 +01:00
@@ -986,3 +986,144 @@
 	}
 	return 0;
 }
+
+/**
+ * ntfs_attr_set - fill (a part of) an attribute with a byte
+ * @ni:		ntfs inode describing the attribute to fill
+ * @ofs:	offset inside the attribute at which to start to fill
+ * @cnt:	number of bytes to fill
+ * @val:	the unsigned 8-bit value with which to fill the attribute
+ *
+ * Fill @cnt bytes of the attribute described by the ntfs inode @ni starting at
+ * byte offset @ofs inside the attribute with the constant byte @val.
+ *
+ * This function is effectively like memset() applied to an ntfs attribute.
+ *
+ * Return 0 on success and -errno on error.  An error code of -ESPIPE means
+ * that @ofs + @cnt were outside the end of the attribute and no write was
+ * performed.
+ */
+int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
+{
+	ntfs_volume *vol = ni->vol;
+	struct address_space *mapping;
+	struct page *page;
+	u8 *kaddr;
+	pgoff_t idx, end;
+	unsigned int start_ofs, end_ofs, size;
+
+	ntfs_debug("Entering for ofs 0x%llx, cnt 0x%llx, val 0x%hx.",
+			(long long)ofs, (long long)cnt, val);
+	BUG_ON(ofs < 0);
+	BUG_ON(cnt < 0);
+	if (!cnt)
+		goto done;
+	mapping = VFS_I(ni)->i_mapping;
+	/* Work out the starting index and page offset. */
+	idx = ofs >> PAGE_CACHE_SHIFT;
+	start_ofs = ofs & ~PAGE_CACHE_MASK;
+	/* Work out the ending index and page offset. */
+	end = ofs + cnt;
+	end_ofs = end & ~PAGE_CACHE_MASK;
+	/* If the end is outside the inode size return -ESPIPE. */
+	if (unlikely(end > VFS_I(ni)->i_size)) {
+		ntfs_error(vol->sb, "Request exceeds end of attribute.");
+		return -ESPIPE;
+	}
+	end >>= PAGE_CACHE_SHIFT;
+	/* If there is a first partial page, need to do it the slow way. */
+	if (start_ofs) {
+		page = read_cache_page(mapping, idx,
+				(filler_t*)mapping->a_ops->readpage, NULL);
+		if (IS_ERR(page)) {
+			ntfs_error(vol->sb, "Failed to read first partial "
+					"page (sync error, index 0x%lx).", idx);
+			return PTR_ERR(page);
+		}
+		wait_on_page_locked(page);
+		if (unlikely(!PageUptodate(page))) {
+			ntfs_error(vol->sb, "Failed to read first partial page "
+					"(async error, index 0x%lx).", idx);
+			page_cache_release(page);
+			return PTR_ERR(page);
+		}
+		/*
+		 * If the last page is the same as the first page, need to
+		 * limit the write to the end offset.
+		 */
+		size = PAGE_CACHE_SIZE;
+		if (idx == end)
+			size = end_ofs;
+		kaddr = kmap_atomic(page, KM_USER0);
+		memset(kaddr + start_ofs, val, size - start_ofs);
+		flush_dcache_page(page);
+		kunmap_atomic(kaddr, KM_USER0);
+		set_page_dirty(page);
+		page_cache_release(page);
+		if (idx == end)
+			goto done;
+		idx++;
+	}
+	/* Do the whole pages the fast way. */
+	for (; idx < end; idx++) {
+		/* Find or create the current page.  (The page is locked.) */
+		page = grab_cache_page(mapping, idx);
+		if (unlikely(!page)) {
+			ntfs_error(vol->sb, "Insufficient memory to grab "
+					"page (index 0x%lx).", idx);
+			return -ENOMEM;
+		}
+		kaddr = kmap_atomic(page, KM_USER0);
+		memset(kaddr, val, PAGE_CACHE_SIZE);
+		flush_dcache_page(page);
+		kunmap_atomic(kaddr, KM_USER0);
+		/*
+		 * If the page has buffers, mark them uptodate since buffer
+		 * state and not page state is definitive in 2.6 kernels.
+		 */
+		if (page_has_buffers(page)) {
+			struct buffer_head *bh, *head;
+
+			bh = head = page_buffers(page);
+			do {
+				set_buffer_uptodate(bh);
+			} while ((bh = bh->b_this_page) != head);
+		}
+		/* Now that buffers are uptodate, set the page uptodate, too. */
+		SetPageUptodate(page);
+		/*
+		 * Set the page and all its buffers dirty and mark the inode
+		 * dirty, too.  The VM will write the page later on.
+		 */
+		set_page_dirty(page);
+		/* Finally unlock and release the page. */
+		unlock_page(page);
+		page_cache_release(page);
+	}
+	/* If there is a last partial page, need to do it the slow way. */
+	if (end_ofs) {
+		page = read_cache_page(mapping, idx,
+				(filler_t*)mapping->a_ops->readpage, NULL);
+		if (IS_ERR(page)) {
+			ntfs_error(vol->sb, "Failed to read last partial page "
+					"(sync error, index 0x%lx).", idx);
+			return PTR_ERR(page);
+		}
+		wait_on_page_locked(page);
+		if (unlikely(!PageUptodate(page))) {
+			ntfs_error(vol->sb, "Failed to read last partial page "
+					"(async error, index 0x%lx).", idx);
+			page_cache_release(page);
+			return PTR_ERR(page);
+		}
+		kaddr = kmap_atomic(page, KM_USER0);
+		memset(kaddr, val, end_ofs);
+		flush_dcache_page(page);
+		kunmap_atomic(kaddr, KM_USER0);
+		set_page_dirty(page);
+		page_cache_release(page);
+	}
+done:
+	ntfs_debug("Done.");
+	return 0;
+}
diff -Nru a/fs/ntfs/attrib.h b/fs/ntfs/attrib.h
--- a/fs/ntfs/attrib.h	2004-10-19 10:13:41 +01:00
+++ b/fs/ntfs/attrib.h	2004-10-19 10:13:41 +01:00
@@ -86,4 +86,7 @@
 
 extern int ntfs_attr_record_resize(MFT_RECORD *m, ATTR_RECORD *a, u32 new_size);
 
+extern int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt,
+		const u8 val);
+
 #endif /* _LINUX_NTFS_ATTRIB_H */
diff -Nru a/fs/ntfs/logfile.c b/fs/ntfs/logfile.c
--- a/fs/ntfs/logfile.c	2004-10-19 10:13:41 +01:00
+++ b/fs/ntfs/logfile.c	2004-10-19 10:13:41 +01:00
@@ -681,60 +681,20 @@
 BOOL ntfs_empty_logfile(struct inode *log_vi)
 {
 	ntfs_volume *vol = NTFS_SB(log_vi->i_sb);
-	struct address_space *mapping;
-	pgoff_t idx, end;
 
 	ntfs_debug("Entering.");
-	if (NVolLogFileEmpty(vol))
-		goto done;
-	mapping = log_vi->i_mapping;
-	end = (log_vi->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-	for (idx = 0; idx < end; ++idx) {
-		struct page *page;
-		u8 *kaddr;
-
-		/* Find or create the current page.  (The page is locked.) */
-		page = grab_cache_page(mapping, idx);
-		if (unlikely(!page)) {
-			ntfs_error(vol->sb, "Insufficient memory to grab "
-					"$LogFile page (index %lu).", idx);
+	if (!NVolLogFileEmpty(vol)) {
+		int err;
+		
+		err = ntfs_attr_set(NTFS_I(log_vi), 0, log_vi->i_size, 0xff);
+		if (unlikely(err)) {
+			ntfs_error(vol->sb, "Failed to fill $LogFile with "
+					"0xff bytes (error code %i).", err);
 			return FALSE;
 		}
-		/*
-		 * Set all bytes in the page to 0xff.  It doesn't matter if we
-		 * go beyond i_size, because ntfs_writepage() will take care of
-		 * that for us.
-		 */
-		kaddr = (u8*)kmap_atomic(page, KM_USER0);
-		memset(kaddr, 0xff, PAGE_CACHE_SIZE);
-		flush_dcache_page(page);
-		kunmap_atomic(kaddr, KM_USER0);
-		/*
-		 * If the page has buffers, mark them uptodate since buffer
-		 * state and not page state is definitive in 2.6 kernels.
-		 */
-		if (page_has_buffers(page)) {
-			struct buffer_head *bh, *head;
-
-			bh = head = page_buffers(page);
-			do {
-				set_buffer_uptodate(bh);
-			} while ((bh = bh->b_this_page) != head);
-		}
-		/* Now that buffers are uptodate, set the page uptodate, too. */
-		SetPageUptodate(page);
-		/*
-		 * Set the page and all its buffers dirty and mark the inode
-		 * dirty, too. The VM will write the page later on.
-		 */
-		set_page_dirty(page);
-		/* Finally unlock and release the page. */
-		unlock_page(page);
-		page_cache_release(page);
+		/* Set the flag so we do not have to do it again on remount. */
+		NVolSetLogFileEmpty(vol);
 	}
-	/* We set the flag so we do not clear the log file again on remount. */
-	NVolSetLogFileEmpty(vol);
-done:
 	ntfs_debug("Done.");
 	return TRUE;
 }

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 11/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:41                   ` [PATCH 10/37] " Anton Altaparmakov
@ 2004-10-19  9:41                     ` Anton Altaparmakov
  2004-10-19  9:41                       ` [PATCH 12/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 11/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/01 1.2011.1.1)
   NTFS: Remove unnecessary casts from LCN_* constants.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:13:45 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:13:45 +01:00
@@ -43,6 +43,7 @@
 	- Implement the equivalent of memset() for an ntfs attribute in
 	  fs/ntfs/attrib.[hc]::ntfs_attr_set() and switch
 	  fs/ntfs/logfile.c::ntfs_empty_logfile() to using it.
+	- Remove unnecessary casts from LCN_* constants.
 
 2.1.19 - Many cleanups, improvements, and a minor bug fix.
 
diff -Nru a/fs/ntfs/aops.c b/fs/ntfs/aops.c
--- a/fs/ntfs/aops.c	2004-10-19 10:13:45 +01:00
+++ b/fs/ntfs/aops.c	2004-10-19 10:13:45 +01:00
@@ -234,7 +234,7 @@
 					rl++;
 				lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
 			} else
-				lcn = (LCN)LCN_RL_NOT_MAPPED;
+				lcn = LCN_RL_NOT_MAPPED;
 			/* Successful remap. */
 			if (lcn >= 0) {
 				/* Setup buffer head to correct block. */
@@ -639,7 +639,7 @@
 				rl++;
 			lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
 		} else
-			lcn = (LCN)LCN_RL_NOT_MAPPED;
+			lcn = LCN_RL_NOT_MAPPED;
 		/* Successful remap. */
 		if (lcn >= 0) {
 			/* Setup buffer head to point to correct block. */
@@ -1404,7 +1404,7 @@
 					rl++;
 				lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
 			} else
-				lcn = (LCN)LCN_RL_NOT_MAPPED;
+				lcn = LCN_RL_NOT_MAPPED;
 			if (unlikely(lcn < 0)) {
 				/*
 				 * We extended the attribute allocation above.
diff -Nru a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c
--- a/fs/ntfs/attrib.c	2004-10-19 10:13:45 +01:00
+++ b/fs/ntfs/attrib.c	2004-10-19 10:13:45 +01:00
@@ -140,7 +140,7 @@
 	if (likely(rl && vcn >= rl[0].vcn)) {
 		while (likely(rl->length)) {
 			if (likely(vcn < rl[1].vcn)) {
-				if (likely(rl->lcn >= (LCN)LCN_HOLE)) {
+				if (likely(rl->lcn >= LCN_HOLE)) {
 					ntfs_debug("Done.");
 					return rl;
 				}
@@ -148,8 +148,8 @@
 			}
 			rl++;
 		}
-		if (likely(rl->lcn != (LCN)LCN_RL_NOT_MAPPED)) {
-			if (likely(rl->lcn == (LCN)LCN_ENOENT))
+		if (likely(rl->lcn != LCN_RL_NOT_MAPPED)) {
+			if (likely(rl->lcn == LCN_ENOENT))
 				err = -ENOENT;
 			else
 				err = -EIO;
diff -Nru a/fs/ntfs/compress.c b/fs/ntfs/compress.c
--- a/fs/ntfs/compress.c	2004-10-19 10:13:45 +01:00
+++ b/fs/ntfs/compress.c	2004-10-19 10:13:45 +01:00
@@ -602,7 +602,7 @@
 				rl++;
 			lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
 		} else
-			lcn = (LCN)LCN_RL_NOT_MAPPED;
+			lcn = LCN_RL_NOT_MAPPED;
 		ntfs_debug("Reading vcn = 0x%llx, lcn = 0x%llx.",
 				(unsigned long long)vcn,
 				(unsigned long long)lcn);
diff -Nru a/fs/ntfs/lcnalloc.c b/fs/ntfs/lcnalloc.c
--- a/fs/ntfs/lcnalloc.c	2004-10-19 10:13:45 +01:00
+++ b/fs/ntfs/lcnalloc.c	2004-10-19 10:13:45 +01:00
@@ -855,7 +855,7 @@
 		err = PTR_ERR(rl);
 		goto err_out;
 	}
-	if (unlikely(rl->lcn < (LCN)LCN_HOLE)) {
+	if (unlikely(rl->lcn < LCN_HOLE)) {
 		if (!is_rollback)
 			ntfs_error(vol->sb, "First runlist element has "
 					"invalid lcn, aborting.");
@@ -895,7 +895,7 @@
 	 * free them.
 	 */
 	for (; rl->length && count != 0; ++rl) {
-		if (unlikely(rl->lcn < (LCN)LCN_HOLE)) {
+		if (unlikely(rl->lcn < LCN_HOLE)) {
 			VCN vcn;
 
 			/*
@@ -926,7 +926,7 @@
 							"element.");
 				goto err_out;
 			}
-			if (unlikely(rl->lcn < (LCN)LCN_HOLE)) {
+			if (unlikely(rl->lcn < LCN_HOLE)) {
 				if (!is_rollback)
 					ntfs_error(vol->sb, "Runlist element "
 							"has invalid lcn "
diff -Nru a/fs/ntfs/runlist.c b/fs/ntfs/runlist.c
--- a/fs/ntfs/runlist.c	2004-10-19 10:13:45 +01:00
+++ b/fs/ntfs/runlist.c	2004-10-19 10:13:45 +01:00
@@ -530,7 +530,7 @@
 	si = di = 0;
 
 	/* Skip any unmapped start element(s) in the source runlist. */
-	while (srl[si].length && srl[si].lcn < (LCN)LCN_HOLE)
+	while (srl[si].length && srl[si].lcn < LCN_HOLE)
 		si++;
 
 	/* Can't have an entirely unmapped source runlist. */
@@ -563,7 +563,7 @@
 	for (dend = di; drl[dend].length; dend++)
 		;
 
-	if (srl[send].lcn == (LCN)LCN_ENOENT)
+	if (srl[send].lcn == LCN_ENOENT)
 		marker_vcn = srl[marker = send].vcn;
 
 	/* Scan to the last element with lcn >= LCN_HOLE. */
@@ -624,7 +624,7 @@
 						"with LCN_ENOENT.",
 						(unsigned long long)
 						drl[ds].lcn);
-				drl[ds].lcn = (LCN)LCN_ENOENT;
+				drl[ds].lcn = LCN_ENOENT;
 				goto finished;
 			}
 			/*
@@ -632,11 +632,11 @@
 			 * @drl or extend an existing one before adding the
 			 * ENOENT terminator.
 			 */
-			if (drl[ds].lcn == (LCN)LCN_ENOENT) {
+			if (drl[ds].lcn == LCN_ENOENT) {
 				ds--;
 				slots = 1;
 			}
-			if (drl[ds].lcn != (LCN)LCN_RL_NOT_MAPPED) {
+			if (drl[ds].lcn != LCN_RL_NOT_MAPPED) {
 				/* Add an unmapped runlist element. */
 				if (!slots) {
 					/* FIXME/TODO: We need to have the
@@ -651,7 +651,7 @@
 				if (slots != 1)
 					drl[ds].vcn = drl[ds - 1].vcn +
 							drl[ds - 1].length;
-				drl[ds].lcn = (LCN)LCN_RL_NOT_MAPPED;
+				drl[ds].lcn = LCN_RL_NOT_MAPPED;
 				/* We now used up a slot. */
 				slots--;
 			}
@@ -666,7 +666,7 @@
 					goto critical_error;
 			}
 			drl[ds].vcn = marker_vcn;
-			drl[ds].lcn = (LCN)LCN_ENOENT;
+			drl[ds].lcn = LCN_ENOENT;
 			drl[ds].length = (s64)0;
 		}
 	}
@@ -753,8 +753,8 @@
 		return ERR_PTR(-ENOMEM);
 	/* Insert unmapped starting element if necessary. */
 	if (vcn) {
-		rl->vcn = (VCN)0;
-		rl->lcn = (LCN)LCN_RL_NOT_MAPPED;
+		rl->vcn = 0;
+		rl->lcn = LCN_RL_NOT_MAPPED;
 		rl->length = vcn;
 		rlpos++;
 	}
@@ -819,7 +819,7 @@
 		 * to LCN_HOLE.
 		 */
 		if (!(*buf & 0xf0))
-			rl[rlpos].lcn = (LCN)LCN_HOLE;
+			rl[rlpos].lcn = LCN_HOLE;
 		else {
 			/* Get the lcn change which really can be negative. */
 			u8 b2 = *buf & 0xf;
@@ -892,7 +892,7 @@
 					(unsigned long long)max_cluster);
 			rl[rlpos].vcn = vcn;
 			vcn += rl[rlpos].length = max_cluster - deltaxcn;
-			rl[rlpos].lcn = (LCN)LCN_RL_NOT_MAPPED;
+			rl[rlpos].lcn = LCN_RL_NOT_MAPPED;
 			rlpos++;
 		} else if (unlikely(deltaxcn > max_cluster)) {
 			ntfs_error(vol->sb, "Corrupt attribute. deltaxcn = "
@@ -901,9 +901,9 @@
 					(unsigned long long)max_cluster);
 			goto mpa_err;
 		}
-		rl[rlpos].lcn = (LCN)LCN_ENOENT;
+		rl[rlpos].lcn = LCN_ENOENT;
 	} else /* Not the base extent. There may be more extents to follow. */
-		rl[rlpos].lcn = (LCN)LCN_RL_NOT_MAPPED;
+		rl[rlpos].lcn = LCN_RL_NOT_MAPPED;
 
 	/* Setup terminating runlist element. */
 	rl[rlpos].vcn = vcn;
@@ -962,11 +962,11 @@
 	 * necessary.
 	 */
 	if (unlikely(!rl))
-		return (LCN)LCN_RL_NOT_MAPPED;
+		return LCN_RL_NOT_MAPPED;
 
 	/* Catch out of lower bounds vcn. */
 	if (unlikely(vcn < rl[0].vcn))
-		return (LCN)LCN_ENOENT;
+		return LCN_ENOENT;
 
 	for (i = 0; likely(rl[i].length); i++) {
 		if (unlikely(vcn < rl[i+1].vcn)) {
@@ -982,7 +982,7 @@
 	if (likely(rl[i].lcn < (LCN)0))
 		return rl[i].lcn;
 	/* Just in case... We could replace this with BUG() some day. */
-	return (LCN)LCN_ENOENT;
+	return LCN_ENOENT;
 }
 
 /**

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 12/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:41                     ` [PATCH 11/37] " Anton Altaparmakov
@ 2004-10-19  9:41                       ` Anton Altaparmakov
  2004-10-19  9:41                         ` [PATCH 13/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 12/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/02 1.2011.1.2)
   NTFS: Implement fs/ntfs/runlist.c::ntfs_rl_truncate_nolock().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:13:49 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:13:49 +01:00
@@ -44,6 +44,7 @@
 	  fs/ntfs/attrib.[hc]::ntfs_attr_set() and switch
 	  fs/ntfs/logfile.c::ntfs_empty_logfile() to using it.
 	- Remove unnecessary casts from LCN_* constants.
+	- Implement fs/ntfs/runlist.c::ntfs_rl_truncate().
 
 2.1.19 - Many cleanups, improvements, and a minor bug fix.
 
diff -Nru a/fs/ntfs/runlist.c b/fs/ntfs/runlist.c
--- a/fs/ntfs/runlist.c	2004-10-19 10:13:49 +01:00
+++ b/fs/ntfs/runlist.c	2004-10-19 10:13:49 +01:00
@@ -1321,3 +1321,139 @@
 		err = -EIO;
 	return err;
 }
+
+/**
+ * ntfs_rl_truncate_nolock - truncate a runlist starting at a specified vcn
+ * @runlist:	runlist to truncate
+ * @new_length:	the new length of the runlist in VCNs
+ *
+ * Truncate the runlist described by @runlist as well as the memory buffer
+ * holding the runlist elements to a length of @new_length VCNs.
+ *
+ * If @new_length lies within the runlist, the runlist elements with VCNs of
+ * @new_length and above are discarded.
+ *
+ * If @new_length lies beyond the runlist, a sparse runlist element is added to
+ * the end of the runlist @runlist or if the last runlist element is a sparse
+ * one already, this is extended.
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Locking: The caller must hold @runlist->lock for writing.
+ */
+int ntfs_rl_truncate_nolock(const ntfs_volume *vol, runlist *const runlist,
+		const s64 new_length)
+{
+	runlist_element *rl;
+	int old_size;
+
+	ntfs_debug("Entering for new_length 0x%llx.", (long long)new_length);
+	BUG_ON(!runlist);
+	BUG_ON(new_length < 0);
+	rl = runlist->rl;
+	if (unlikely(!rl)) {
+		/*
+		 * Create a runlist consisting of a sparse runlist element of
+		 * length @new_length followed by a terminator runlist element.
+		 */
+		rl = ntfs_malloc_nofs(PAGE_SIZE);
+		if (unlikely(!rl)) {
+			ntfs_error(vol->sb, "Not enough memory to allocate "
+					"runlist element buffer.");
+			return -ENOMEM;
+		}
+		runlist->rl = rl;
+		rl[1].length = rl->vcn = 0;
+		rl->lcn = LCN_HOLE;
+		rl[1].vcn = rl->length = new_length;
+		rl[1].lcn = LCN_ENOENT;
+		return 0;
+	}
+	BUG_ON(new_length < rl->vcn);
+	/* Find @new_length in the runlist. */
+	while (likely(rl->length && new_length >= rl[1].vcn))
+		rl++;
+	/*
+	 * If not at the end of the runlist we need to shrink it.
+	 * If at the end of the runlist we need to expand it.
+	 */
+	if (rl->length) {
+		runlist_element *trl;
+		BOOL is_end;
+
+		ntfs_debug("Shrinking runlist.");
+		/* Determine the runlist size. */
+		trl = rl + 1;
+		while (likely(trl->length))
+			trl++;
+		old_size = trl - runlist->rl + 1;
+		/* Truncate the run. */
+		rl->length = new_length - rl->vcn;
+		/*
+		 * If a run was partially truncated, make the following runlist
+		 * element a terminator.
+		 */
+		is_end = FALSE;
+		if (rl->length) {
+			rl++;
+			if (!rl->length)
+				is_end = TRUE;
+			rl->vcn = new_length;
+			rl->length = 0;
+		}
+		rl->lcn = LCN_ENOENT;
+		/* Reallocate memory if necessary. */
+		if (!is_end) {
+			int new_size = rl - runlist->rl + 1;
+			rl = ntfs_rl_realloc(runlist->rl, old_size, new_size);
+			if (IS_ERR(rl))
+				ntfs_warning(vol->sb, "Failed to shrink "
+						"runlist buffer.  This just "
+						"wastes a bit of memory "
+						"temporarily so we ignore it "
+						"and return success.");
+			else
+				runlist->rl = rl;
+		}
+	} else if (likely(/* !rl->length && */ new_length > rl->vcn)) {
+		ntfs_debug("Expanding runlist.");
+		/*
+		 * If there is a previous runlist element and it is a sparse
+		 * one, extend it.  Otherwise need to add a new, sparse runlist
+		 * element.
+		 */
+		if ((rl > runlist->rl) && ((rl - 1)->lcn == LCN_HOLE))
+			(rl - 1)->length = new_length - (rl - 1)->vcn;
+		else {
+			/* Determine the runlist size. */
+			old_size = rl - runlist->rl + 1;
+			/* Reallocate memory if necessary. */
+			rl = ntfs_rl_realloc(runlist->rl, old_size,
+					old_size + 1);
+			if (IS_ERR(rl)) {
+				ntfs_error(vol->sb, "Failed to expand runlist "
+						"buffer, aborting.");
+				return PTR_ERR(rl);
+			}
+			runlist->rl = rl;
+			/*
+			 * Set @rl to the same runlist element in the new
+			 * runlist as before in the old runlist.
+			 */
+			rl += old_size - 1;
+			/* Add a new, sparse runlist element. */
+			rl->lcn = LCN_HOLE;
+			rl->length = new_length - rl->vcn;
+			/* Add a new terminator runlist element. */
+			rl++;
+			rl->length = 0;
+		}
+		rl->vcn = new_length;
+		rl->lcn = LCN_ENOENT;
+	} else /* if (unlikely(!rl->length && new_length == rl->vcn)) */ {
+		/* Runlist already has same size as requested. */
+		rl->lcn = LCN_ENOENT;
+	}
+	ntfs_debug("Done.");
+	return 0;
+}
diff -Nru a/fs/ntfs/runlist.h b/fs/ntfs/runlist.h
--- a/fs/ntfs/runlist.h	2004-10-19 10:13:49 +01:00
+++ b/fs/ntfs/runlist.h	2004-10-19 10:13:49 +01:00
@@ -55,4 +55,7 @@
 		const int dst_len, const runlist_element *rl,
 		const VCN start_vcn, VCN *const stop_vcn);
 
+extern int ntfs_rl_truncate_nolock(const ntfs_volume *vol,
+		runlist *const runlist, const s64 new_length);
+
 #endif /* _LINUX_NTFS_RUNLIST_H */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 13/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:41                       ` [PATCH 12/37] " Anton Altaparmakov
@ 2004-10-19  9:41                         ` Anton Altaparmakov
  2004-10-19  9:42                           ` [PATCH 14/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 13/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/03 1.2017)
   NTFS: Add MFT_RECORD_OLD as a copy of MFT_RECORD in fs/ntfs/layout.h
         and change MFT_RECORD to contain the NTFS 3.1+ specific fields.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:13:53 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:13:53 +01:00
@@ -45,6 +45,8 @@
 	  fs/ntfs/logfile.c::ntfs_empty_logfile() to using it.
 	- Remove unnecessary casts from LCN_* constants.
 	- Implement fs/ntfs/runlist.c::ntfs_rl_truncate_nolock().
+	- Add MFT_RECORD_OLD as a copy of MFT_RECORD in fs/ntfs/layout.h and
+	  change MFT_RECORD to contain the NTFS 3.1+ specific fields.
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/layout.h b/fs/ntfs/layout.h
--- a/fs/ntfs/layout.h	2004-10-19 10:13:53 +01:00
+++ b/fs/ntfs/layout.h	2004-10-19 10:13:53 +01:00
@@ -260,7 +260,7 @@
 enum {
 	MFT_RECORD_IN_USE	= const_cpu_to_le16(0x0001),
 	MFT_RECORD_IS_DIRECTORY = const_cpu_to_le16(0x0002),
-};
+} __attrobite__ ((__packed__));
 
 typedef le16 MFT_RECORD_FLAGS;
 
@@ -385,21 +385,86 @@
 				   NOTE: Every time the mft record is reused
 				   this number is set to zero.  NOTE: The first
 				   instance number is always 0. */
-/* sizeof() = 42 bytes */
-/* NTFS 3.1+ (Windows XP and above) introduce the following additions. */
-/* 42*/ //le16 reserved;	/* Reserved/alignment. */
-/* 44*/ //le32 mft_record_number;/* Number of this mft record. */
+/* The below fields are specific to NTFS 3.1+ (Windows XP and above): */
+/* 42*/ le16 reserved;		/* Reserved/alignment. */
+/* 44*/ le32 mft_record_number;	/* Number of this mft record. */
 /* sizeof() = 48 bytes */
 /*
  * When (re)using the mft record, we place the update sequence array at this
- * offset, i.e. before we start with the attributes. This also makes sense,
+ * offset, i.e. before we start with the attributes.  This also makes sense,
  * otherwise we could run into problems with the update sequence array
  * containing in itself the last two bytes of a sector which would mean that
- * multi sector transfer protection wouldn't work. As you can't protect data
+ * multi sector transfer protection wouldn't work.  As you can't protect data
  * by overwriting it since you then can't get it back...
  * When reading we obviously use the data from the ntfs record header.
  */
 } __attribute__ ((__packed__)) MFT_RECORD;
+
+/* This is the version without the NTFS 3.1+ specific fields. */
+typedef struct {
+/*Ofs*/
+/*  0	NTFS_RECORD; -- Unfolded here as gcc doesn't like unnamed structs. */
+	NTFS_RECORD_TYPE magic;	/* Usually the magic is "FILE". */
+	le16 usa_ofs;		/* See NTFS_RECORD definition above. */
+	le16 usa_count;		/* See NTFS_RECORD definition above. */
+
+/*  8*/	le64 lsn;		/* $LogFile sequence number for this record.
+				   Changed every time the record is modified. */
+/* 16*/	le16 sequence_number;	/* Number of times this mft record has been
+				   reused. (See description for MFT_REF
+				   above.) NOTE: The increment (skipping zero)
+				   is done when the file is deleted. NOTE: If
+				   this is zero it is left zero. */
+/* 18*/	le16 link_count;	/* Number of hard links, i.e. the number of
+				   directory entries referencing this record.
+				   NOTE: Only used in mft base records.
+				   NOTE: When deleting a directory entry we
+				   check the link_count and if it is 1 we
+				   delete the file. Otherwise we delete the
+				   FILE_NAME_ATTR being referenced by the
+				   directory entry from the mft record and
+				   decrement the link_count.
+				   FIXME: Careful with Win32 + DOS names! */
+/* 20*/	le16 attrs_offset;	/* Byte offset to the first attribute in this
+				   mft record from the start of the mft record.
+				   NOTE: Must be aligned to 8-byte boundary. */
+/* 22*/	MFT_RECORD_FLAGS flags;	/* Bit array of MFT_RECORD_FLAGS. When a file
+				   is deleted, the MFT_RECORD_IN_USE flag is
+				   set to zero. */
+/* 24*/	le32 bytes_in_use;	/* Number of bytes used in this mft record.
+				   NOTE: Must be aligned to 8-byte boundary. */
+/* 28*/	le32 bytes_allocated;	/* Number of bytes allocated for this mft
+				   record. This should be equal to the mft
+				   record size. */
+/* 32*/	leMFT_REF base_mft_record;/* This is zero for base mft records.
+				   When it is not zero it is a mft reference
+				   pointing to the base mft record to which
+				   this record belongs (this is then used to
+				   locate the attribute list attribute present
+				   in the base record which describes this
+				   extension record and hence might need
+				   modification when the extension record
+				   itself is modified, also locating the
+				   attribute list also means finding the other
+				   potential extents, belonging to the non-base
+				   mft record). */
+/* 40*/	le16 next_attr_instance;/* The instance number that will be assigned to
+				   the next attribute added to this mft record.
+				   NOTE: Incremented each time after it is used.
+				   NOTE: Every time the mft record is reused
+				   this number is set to zero.  NOTE: The first
+				   instance number is always 0. */
+/* sizeof() = 42 bytes */
+/*
+ * When (re)using the mft record, we place the update sequence array at this
+ * offset, i.e. before we start with the attributes.  This also makes sense,
+ * otherwise we could run into problems with the update sequence array
+ * containing in itself the last two bytes of a sector which would mean that
+ * multi sector transfer protection wouldn't work.  As you can't protect data
+ * by overwriting it since you then can't get it back...
+ * When reading we obviously use the data from the ntfs record header.
+ */
+} __attribute__ ((__packed__)) MFT_RECORD_OLD;
 
 /*
  * System defined attributes (32-bit).  Each attribute type has a corresponding

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 14/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:41                         ` [PATCH 13/37] " Anton Altaparmakov
@ 2004-10-19  9:42                           ` Anton Altaparmakov
  2004-10-19  9:42                             ` [PATCH 15/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 14/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/03 1.2018)
   NTFS: Add some debugging checks to fs/ntfs/inode.c::ntfs_truncate() and fix
         a typo in fs/ntfs/layout.h.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/inode.c b/fs/ntfs/inode.c
--- a/fs/ntfs/inode.c	2004-10-19 10:13:56 +01:00
+++ b/fs/ntfs/inode.c	2004-10-19 10:13:56 +01:00
@@ -2289,6 +2289,8 @@
 	MFT_RECORD *m;
 	int err;
 
+	BUG_ON(NInoAttr(ni));
+	BUG_ON(ni->nr_extents < 0);
 	m = map_mft_record(ni);
 	if (IS_ERR(m)) {
 		ntfs_error(vi->i_sb, "Failed to map mft record for inode 0x%lx "
diff -Nru a/fs/ntfs/layout.h b/fs/ntfs/layout.h
--- a/fs/ntfs/layout.h	2004-10-19 10:13:56 +01:00
+++ b/fs/ntfs/layout.h	2004-10-19 10:13:56 +01:00
@@ -260,7 +260,7 @@
 enum {
 	MFT_RECORD_IN_USE	= const_cpu_to_le16(0x0001),
 	MFT_RECORD_IS_DIRECTORY = const_cpu_to_le16(0x0002),
-} __attrobite__ ((__packed__));
+} __attribute__ ((__packed__));
 
 typedef le16 MFT_RECORD_FLAGS;
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 15/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:42                           ` [PATCH 14/37] " Anton Altaparmakov
@ 2004-10-19  9:42                             ` Anton Altaparmakov
  2004-10-19  9:42                               ` [PATCH 16/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 15/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/05 1.2022.1.1)
   NTFS: Add a helper function fs/ntfs/aops.c::mark_ntfs_record_dirty() which
         marks all buffers belonging to an ntfs record dirty, followed by
         marking the page the ntfs record is in dirty and also marking the vfs
         inode containing the ntfs record dirty (I_DIRTY_PAGES).
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:00 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:00 +01:00
@@ -47,6 +47,10 @@
 	- Implement fs/ntfs/runlist.c::ntfs_rl_truncate_nolock().
 	- Add MFT_RECORD_OLD as a copy of MFT_RECORD in fs/ntfs/layout.h and
 	  change MFT_RECORD to contain the NTFS 3.1+ specific fields.
+	- Add a helper function fs/ntfs/aops.c::mark_ntfs_record_dirty() which
+	  marks all buffers belonging to an ntfs record dirty, followed by
+	  marking the page the ntfs record is in dirty and also marking the vfs
+	  inode containing the ntfs record dirty (I_DIRTY_PAGES).
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/aops.c b/fs/ntfs/aops.c
--- a/fs/ntfs/aops.c	2004-10-19 10:14:00 +01:00
+++ b/fs/ntfs/aops.c	2004-10-19 10:14:00 +01:00
@@ -27,6 +27,7 @@
 #include <linux/swap.h>
 #include <linux/buffer_head.h>
 
+#include "aops.h"
 #include "ntfs.h"
 
 /**
@@ -2028,3 +2029,44 @@
 						   belonging to the page. */
 #endif /* NTFS_RW */
 };
+
+#ifdef NTFS_RW
+
+/**
+ * mark_ntfs_record_dirty - mark an ntfs record dirty
+ * @ni:		ntfs inode to which the ntfs record to be marked dirty belongs
+ * @page:	page containing the ntfs record to mark dirty
+ * @rec_start:	byte offset within @page at which the ntfs record begins
+ *
+ * If the ntfs record is the same size as the page cache page @page, set all
+ * buffers in the page dirty.  Otherwise, set only the buffers in which the
+ * ntfs record is located dirty.
+ *
+ * Also, set the page containing the ntfs record dirty, which also marks the
+ * vfs inode the ntfs record belongs to dirty (I_DIRTY_PAGES).
+ */
+void mark_ntfs_record_dirty(ntfs_inode *ni, struct page *page,
+		unsigned int rec_start) {
+	struct buffer_head *bh, *head;
+	unsigned int rec_end, bh_size, bh_start, bh_end;
+
+	BUG_ON(!page);
+	BUG_ON(!page_has_buffers(page));
+	if (ni->itype.index.block_size == PAGE_CACHE_SIZE) {
+		__set_page_dirty_buffers(page);
+		return;
+	}
+	rec_end = rec_start + ni->itype.index.block_size;
+	bh_size = ni->vol->sb->s_blocksize;
+	bh_start = 0;
+	bh = head = page_buffers(page);
+	do {
+		bh_end = bh_start + bh_size;
+		if ((bh_start >= rec_start) && (bh_end <= rec_end))
+			set_buffer_dirty(bh);
+		bh_start = bh_end;
+	} while ((bh = bh->b_this_page) != head);
+	__set_page_dirty_nobuffers(page);
+}
+
+#endif /* NTFS_RW */
diff -Nru a/fs/ntfs/aops.h b/fs/ntfs/aops.h
--- /dev/null	Wed Dec 31 16:00:00 196900
+++ b/fs/ntfs/aops.h	2004-10-19 10:14:00 +01:00
@@ -0,0 +1,36 @@
+/**
+ * aops.h - Defines for NTFS kernel address space operations and page cache
+ *	    handling.  Part of the Linux-NTFS project.
+ *
+ * Copyright (c) 2001-2004 Anton Altaparmakov
+ * Copyright (c) 2002 Richard Russon
+ *
+ * This program/include file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as published
+ * by the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program/include file is distributed in the hope that it will be
+ * useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program (in the main directory of the Linux-NTFS
+ * distribution in the file COPYING); if not, write to the Free Software
+ * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef _LINUX_NTFS_AOPS_H
+#define _LINUX_NTFS_AOPS_H
+
+#ifdef NTFS_RW
+
+#include "inode.h"
+
+extern void mark_ntfs_record_dirty(ntfs_inode *ni, struct page *page,
+		unsigned int rec_start);
+
+#endif /* NTFS_RW */
+
+#endif /* _LINUX_NTFS_AOPS_H */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 16/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:42                             ` [PATCH 15/37] " Anton Altaparmakov
@ 2004-10-19  9:42                               ` Anton Altaparmakov
  2004-10-19  9:42                                 ` [PATCH 17/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 16/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/05 1.2022.1.2)
   NTFS: Switch fs/ntfs/index.h::ntfs_index_entry_mark_dirty() to using the
         new helper fs/ntfs/aops.c::mark_ntfs_record_dirty() and remove the no
         longer needed fs/ntfs/index.[hc]::__ntfs_index_entry_mark_dirty().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:04 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:04 +01:00
@@ -51,6 +51,9 @@
 	  marks all buffers belonging to an ntfs record dirty, followed by
 	  marking the page the ntfs record is in dirty and also marking the vfs
 	  inode containing the ntfs record dirty (I_DIRTY_PAGES).
+	- Switch fs/ntfs/index.h::ntfs_index_entry_mark_dirty() to using the
+	  new helper fs/ntfs/aops.c::mark_ntfs_record_dirty() and remove the no
+	  longer needed fs/ntfs/index.[hc]::__ntfs_index_entry_mark_dirty().
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/index.c b/fs/ntfs/index.c
--- a/fs/ntfs/index.c	2004-10-19 10:14:04 +01:00
+++ b/fs/ntfs/index.c	2004-10-19 10:14:04 +01:00
@@ -459,58 +459,3 @@
 	err = -EIO;
 	goto err_out;
 }
-
-#ifdef NTFS_RW
-
-/**
- * __ntfs_index_entry_mark_dirty - mark an index allocation entry dirty
- * @ictx:	ntfs index context describing the index entry
- *
- * NOTE: You want to use fs/ntfs/index.h::ntfs_index_entry_mark_dirty() instead!
- * 
- * Mark the index allocation entry described by the index entry context @ictx
- * dirty.
- *
- * The index entry must be in an index block belonging to the index allocation
- * attribute.  Mark the buffers belonging to the index record as well as the
- * page cache page the index block is in dirty.  This automatically marks the
- * VFS inode of the ntfs index inode to which the index entry belongs dirty,
- * too (I_DIRTY_PAGES) and this in turn ensures the page buffers, and hence the
- * dirty index block, will be written out to disk later.
- */
-void __ntfs_index_entry_mark_dirty(ntfs_index_context *ictx)
-{
-	ntfs_inode *ni;
-	struct page *page;
-	struct buffer_head *bh, *head;
-	unsigned int rec_start, rec_end, bh_size, bh_start, bh_end;
-
-	BUG_ON(ictx->is_in_root);
-	ni = ictx->idx_ni;
-	page = ictx->page;
-	BUG_ON(!page_has_buffers(page));
-	/*
-	 * If the index block is the same size as the page cache page, set all
-	 * the buffers in the page, as well as the page itself, dirty.
-	 */
-	if (ni->itype.index.block_size == PAGE_CACHE_SIZE) {
-		__set_page_dirty_buffers(page);
-		return;
-	}
-	/* Set only the buffers in which the index block is located dirty. */
-	rec_start = (unsigned int)((u8*)ictx->ia - (u8*)page_address(page));
-	rec_end = rec_start + ni->itype.index.block_size;
-	bh_size = ni->vol->sb->s_blocksize;
-	bh_start = 0;
-	bh = head = page_buffers(page);
-	do {
-		bh_end = bh_start + bh_size;
-		if ((bh_start >= rec_start) && (bh_end <= rec_end))
-			set_buffer_dirty(bh);
-		bh_start = bh_end;
-	} while ((bh = bh->b_this_page) != head);
-	/* Finally, set the page itself dirty, too. */
-	__set_page_dirty_nobuffers(page);
-}
-
-#endif /* NTFS_RW */
diff -Nru a/fs/ntfs/index.h b/fs/ntfs/index.h
--- a/fs/ntfs/index.h	2004-10-19 10:14:04 +01:00
+++ b/fs/ntfs/index.h	2004-10-19 10:14:04 +01:00
@@ -30,6 +30,7 @@
 #include "inode.h"
 #include "attrib.h"
 #include "mft.h"
+#include "aops.h"
 
 /**
  * @idx_ni:	index inode containing the @entry described by this context
@@ -115,8 +116,6 @@
 		flush_dcache_page(ictx->page);
 }
 
-extern void __ntfs_index_entry_mark_dirty(ntfs_index_context *ictx);
-
 /**
  * ntfs_index_entry_mark_dirty - mark an index entry dirty
  * @ictx:	ntfs index context describing the index entry
@@ -140,7 +139,8 @@
 	if (ictx->is_in_root)
 		mark_mft_record_dirty(ictx->actx->ntfs_ino);
 	else
-		__ntfs_index_entry_mark_dirty(ictx);
+		mark_ntfs_record_dirty(ictx->idx_ni, ictx->page,
+			(u8*)ictx->ia - (u8*)page_address(ictx->page));
 }
 
 #endif /* NTFS_RW */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 17/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:42                               ` [PATCH 16/37] " Anton Altaparmakov
@ 2004-10-19  9:42                                 ` Anton Altaparmakov
  2004-10-19  9:43                                   ` [PATCH 18/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 17/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/07 1.2029)
   NTFS: - Move ntfs_{un,}map_page() from ntfs.h to aops.h and fix resulting
           include errors.
         - Move typedefs for runlist_element and runlist from ntfs.h to
           runlist.h and fix resulting include errors.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:07 +01:00
@@ -54,6 +54,10 @@
 	- Switch fs/ntfs/index.h::ntfs_index_entry_mark_dirty() to using the
 	  new helper fs/ntfs/aops.c::mark_ntfs_record_dirty() and remove the no
 	  longer needed fs/ntfs/index.[hc]::__ntfs_index_entry_mark_dirty().
+	- Move ntfs_{un,}map_page() from ntfs.h to aops.h and fix resulting
+	  include errors.
+	- Move the typedefs for runlist_element and runlist from types.h to
+	  runlist.h and fix resulting include errors.
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/aops.c b/fs/ntfs/aops.c
--- a/fs/ntfs/aops.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/aops.c	2004-10-19 10:14:07 +01:00
@@ -28,6 +28,10 @@
 #include <linux/buffer_head.h>
 
 #include "aops.h"
+#include "debug.h"
+#include "inode.h"
+#include "mft.h"
+#include "types.h"
 #include "ntfs.h"
 
 /**
diff -Nru a/fs/ntfs/aops.h b/fs/ntfs/aops.h
--- a/fs/ntfs/aops.h	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/aops.h	2004-10-19 10:14:07 +01:00
@@ -24,9 +24,76 @@
 #ifndef _LINUX_NTFS_AOPS_H
 #define _LINUX_NTFS_AOPS_H
 
-#ifdef NTFS_RW
+#include <linux/mm.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/fs.h>
 
 #include "inode.h"
+
+/**
+ * ntfs_unmap_page - release a page that was mapped using ntfs_map_page()
+ * @page:	the page to release
+ *
+ * Unpin, unmap and release a page that was obtained from ntfs_map_page().
+ */
+static inline void ntfs_unmap_page(struct page *page)
+{
+	kunmap(page);
+	page_cache_release(page);
+}
+
+/**
+ * ntfs_map_page - map a page into accessible memory, reading it if necessary
+ * @mapping:	address space for which to obtain the page
+ * @index:	index into the page cache for @mapping of the page to map
+ *
+ * Read a page from the page cache of the address space @mapping at position
+ * @index, where @index is in units of PAGE_CACHE_SIZE, and not in bytes.
+ *
+ * If the page is not in memory it is loaded from disk first using the readpage
+ * method defined in the address space operations of @mapping and the page is
+ * added to the page cache of @mapping in the process.
+ *
+ * If the page is in high memory it is mapped into memory directly addressible
+ * by the kernel.
+ *
+ * Finally the page count is incremented, thus pinning the page into place.
+ *
+ * The above means that page_address(page) can be used on all pages obtained
+ * with ntfs_map_page() to get the kernel virtual address of the page.
+ *
+ * When finished with the page, the caller has to call ntfs_unmap_page() to
+ * unpin, unmap and release the page.
+ *
+ * Note this does not grant exclusive access. If such is desired, the caller
+ * must provide it independently of the ntfs_{un}map_page() calls by using
+ * a {rw_}semaphore or other means of serialization. A spin lock cannot be
+ * used as ntfs_map_page() can block.
+ *
+ * The unlocked and uptodate page is returned on success or an encoded error
+ * on failure. Caller has to test for error using the IS_ERR() macro on the
+ * return value. If that evaluates to TRUE, the negative error code can be
+ * obtained using PTR_ERR() on the return value of ntfs_map_page().
+ */
+static inline struct page *ntfs_map_page(struct address_space *mapping,
+		unsigned long index)
+{
+	struct page *page = read_cache_page(mapping, index,
+			(filler_t*)mapping->a_ops->readpage, NULL);
+
+	if (!IS_ERR(page)) {
+		wait_on_page_locked(page);
+		kmap(page);
+		if (PageUptodate(page) && !PageError(page))
+			return page;
+		ntfs_unmap_page(page);
+		return ERR_PTR(-EIO);
+	}
+	return page;
+}
+
+#ifdef NTFS_RW
 
 extern void mark_ntfs_record_dirty(ntfs_inode *ni, struct page *page,
 		unsigned int rec_start);
diff -Nru a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c
--- a/fs/ntfs/attrib.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/attrib.c	2004-10-19 10:14:07 +01:00
@@ -21,6 +21,10 @@
  */
 
 #include <linux/buffer_head.h>
+
+#include "attrib.h"
+#include "debug.h"
+#include "mft.h"
 #include "ntfs.h"
 
 /**
diff -Nru a/fs/ntfs/bitmap.c b/fs/ntfs/bitmap.c
--- a/fs/ntfs/bitmap.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/bitmap.c	2004-10-19 10:14:07 +01:00
@@ -25,6 +25,7 @@
 
 #include "bitmap.h"
 #include "debug.h"
+#include "aops.h"
 #include "ntfs.h"
 
 /**
diff -Nru a/fs/ntfs/collate.c b/fs/ntfs/collate.c
--- a/fs/ntfs/collate.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/collate.c	2004-10-19 10:14:07 +01:00
@@ -19,8 +19,9 @@
  * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
 
-#include "ntfs.h"
 #include "collate.h"
+#include "debug.h"
+#include "ntfs.h"
 
 static int ntfs_collate_binary(ntfs_volume *vol,
 		const void *data1, const int data1_len,
diff -Nru a/fs/ntfs/compress.c b/fs/ntfs/compress.c
--- a/fs/ntfs/compress.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/compress.c	2004-10-19 10:14:07 +01:00
@@ -25,6 +25,8 @@
 #include <linux/buffer_head.h>
 #include <linux/blkdev.h>
 
+#include "inode.h"
+#include "debug.h"
 #include "ntfs.h"
 
 /**
diff -Nru a/fs/ntfs/dir.c b/fs/ntfs/dir.c
--- a/fs/ntfs/dir.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/dir.c	2004-10-19 10:14:07 +01:00
@@ -21,8 +21,14 @@
  */
 
 #include <linux/smp_lock.h>
-#include "ntfs.h"
+#include <linux/buffer_head.h>
+
 #include "dir.h"
+#include "aops.h"
+#include "attrib.h"
+#include "mft.h"
+#include "debug.h"
+#include "ntfs.h"
 
 /**
  * The little endian Unicode string $I30 as a global constant.
diff -Nru a/fs/ntfs/dir.h b/fs/ntfs/dir.h
--- a/fs/ntfs/dir.h	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/dir.h	2004-10-19 10:14:07 +01:00
@@ -24,6 +24,8 @@
 #define _LINUX_NTFS_DIR_H
 
 #include "layout.h"
+#include "inode.h"
+#include "types.h"
 
 /*
  * ntfs_name is used to return the file name to the caller of
diff -Nru a/fs/ntfs/file.c b/fs/ntfs/file.c
--- a/fs/ntfs/file.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/file.c	2004-10-19 10:14:07 +01:00
@@ -19,6 +19,10 @@
  * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
 
+#include <linux/pagemap.h>
+#include <linux/buffer_head.h>
+
+#include "debug.h"
 #include "ntfs.h"
 
 /**
diff -Nru a/fs/ntfs/index.c b/fs/ntfs/index.c
--- a/fs/ntfs/index.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/index.c	2004-10-19 10:14:07 +01:00
@@ -19,9 +19,11 @@
  * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
 
-#include "ntfs.h"
+#include "aops.h"
 #include "collate.h"
+#include "debug.h"
 #include "index.h"
+#include "ntfs.h"
 
 /**
  * ntfs_index_ctx_get - allocate and initialize a new index context
diff -Nru a/fs/ntfs/inode.c b/fs/ntfs/inode.c
--- a/fs/ntfs/inode.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/inode.c	2004-10-19 10:14:07 +01:00
@@ -27,8 +27,11 @@
 
 #include "ntfs.h"
 #include "dir.h"
+#include "debug.h"
 #include "inode.h"
 #include "attrib.h"
+#include "malloc.h"
+#include "mft.h"
 #include "time.h"
 
 /**
diff -Nru a/fs/ntfs/inode.h b/fs/ntfs/inode.h
--- a/fs/ntfs/inode.h	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/inode.h	2004-10-19 10:14:07 +01:00
@@ -24,10 +24,17 @@
 #ifndef _LINUX_NTFS_INODE_H
 #define _LINUX_NTFS_INODE_H
 
+#include <linux/mm.h>
+#include <linux/fs.h>
 #include <linux/seq_file.h>
+#include <linux/list.h>
+#include <asm/atomic.h>
+#include <asm/semaphore.h>
 
 #include "layout.h"
 #include "volume.h"
+#include "types.h"
+#include "runlist.h"
 
 typedef struct _ntfs_inode ntfs_inode;
 
diff -Nru a/fs/ntfs/lcnalloc.c b/fs/ntfs/lcnalloc.c
--- a/fs/ntfs/lcnalloc.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/lcnalloc.c	2004-10-19 10:14:07 +01:00
@@ -30,6 +30,7 @@
 #include "volume.h"
 #include "attrib.h"
 #include "malloc.h"
+#include "aops.h"
 #include "ntfs.h"
 
 /**
diff -Nru a/fs/ntfs/lcnalloc.h b/fs/ntfs/lcnalloc.h
--- a/fs/ntfs/lcnalloc.h	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/lcnalloc.h	2004-10-19 10:14:07 +01:00
@@ -28,6 +28,7 @@
 #include <linux/fs.h>
 
 #include "types.h"
+#include "runlist.h"
 #include "volume.h"
 
 typedef enum {
diff -Nru a/fs/ntfs/logfile.c b/fs/ntfs/logfile.c
--- a/fs/ntfs/logfile.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/logfile.c	2004-10-19 10:14:07 +01:00
@@ -29,8 +29,10 @@
 
 #include "logfile.h"
 #include "volume.h"
-#include "ntfs.h"
+#include "aops.h"
 #include "debug.h"
+#include "malloc.h"
+#include "ntfs.h"
 
 /**
  * ntfs_check_restart_page_header - check the page header for consistency
diff -Nru a/fs/ntfs/malloc.h b/fs/ntfs/malloc.h
--- a/fs/ntfs/malloc.h	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/malloc.h	2004-10-19 10:14:07 +01:00
@@ -24,6 +24,7 @@
 
 #include <linux/vmalloc.h>
 #include <linux/slab.h>
+#include <linux/highmem.h>
 
 /**
  * ntfs_malloc_nofs - allocate memory in multiples of pages
diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:14:07 +01:00
@@ -20,10 +20,16 @@
  * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
 
+#include <linux/buffer_head.h>
 #include <linux/swap.h>
 
-#include "ntfs.h"
 #include "bitmap.h"
+#include "lcnalloc.h"
+#include "aops.h"
+#include "debug.h"
+#include "mft.h"
+#include "malloc.h"
+#include "ntfs.h"
 
 /**
  * __format_mft_record - initialize an empty mft record
diff -Nru a/fs/ntfs/mft.h b/fs/ntfs/mft.h
--- a/fs/ntfs/mft.h	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/mft.h	2004-10-19 10:14:07 +01:00
@@ -24,6 +24,8 @@
 #define _LINUX_NTFS_MFT_H
 
 #include <linux/fs.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
 
 #include "inode.h"
 
diff -Nru a/fs/ntfs/namei.c b/fs/ntfs/namei.c
--- a/fs/ntfs/namei.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/namei.c	2004-10-19 10:14:07 +01:00
@@ -23,8 +23,10 @@
 #include <linux/dcache.h>
 #include <linux/security.h>
 
-#include "ntfs.h"
 #include "dir.h"
+#include "debug.h"
+#include "mft.h"
+#include "ntfs.h"
 
 /**
  * ntfs_lookup - find the inode represented by a dentry in a directory inode
diff -Nru a/fs/ntfs/ntfs.h b/fs/ntfs/ntfs.h
--- a/fs/ntfs/ntfs.h	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/ntfs.h	2004-10-19 10:14:07 +01:00
@@ -29,21 +29,12 @@
 #include <linux/module.h>
 #include <linux/compiler.h>
 #include <linux/fs.h>
-#include <linux/buffer_head.h>
 #include <linux/nls.h>
-#include <linux/pagemap.h>
 #include <linux/smp.h>
-#include <asm/atomic.h>
 
 #include "types.h"
-#include "debug.h"
-#include "malloc.h"
-#include "endian.h"
 #include "volume.h"
-#include "inode.h"
 #include "layout.h"
-#include "attrib.h"
-#include "mft.h"
 
 typedef enum {
 	NTFS_BLOCK_SIZE		= 512,
@@ -87,72 +78,12 @@
 	return sb->s_fs_info;
 }
 
-/**
- * ntfs_unmap_page - release a page that was mapped using ntfs_map_page()
- * @page:	the page to release
- *
- * Unpin, unmap and release a page that was obtained from ntfs_map_page().
- */
-static inline void ntfs_unmap_page(struct page *page)
-{
-	kunmap(page);
-	page_cache_release(page);
-}
-
-/**
- * ntfs_map_page - map a page into accessible memory, reading it if necessary
- * @mapping:	address space for which to obtain the page
- * @index:	index into the page cache for @mapping of the page to map
- *
- * Read a page from the page cache of the address space @mapping at position
- * @index, where @index is in units of PAGE_CACHE_SIZE, and not in bytes.
- *
- * If the page is not in memory it is loaded from disk first using the readpage
- * method defined in the address space operations of @mapping and the page is
- * added to the page cache of @mapping in the process.
- *
- * If the page is in high memory it is mapped into memory directly addressible
- * by the kernel.
- *
- * Finally the page count is incremented, thus pinning the page into place.
- *
- * The above means that page_address(page) can be used on all pages obtained
- * with ntfs_map_page() to get the kernel virtual address of the page.
- *
- * When finished with the page, the caller has to call ntfs_unmap_page() to
- * unpin, unmap and release the page.
- *
- * Note this does not grant exclusive access. If such is desired, the caller
- * must provide it independently of the ntfs_{un}map_page() calls by using
- * a {rw_}semaphore or other means of serialization. A spin lock cannot be
- * used as ntfs_map_page() can block.
- *
- * The unlocked and uptodate page is returned on success or an encoded error
- * on failure. Caller has to test for error using the IS_ERR() macro on the
- * return value. If that evaluates to TRUE, the negative error code can be
- * obtained using PTR_ERR() on the return value of ntfs_map_page().
- */
-static inline struct page *ntfs_map_page(struct address_space *mapping,
-		unsigned long index)
-{
-	struct page *page = read_cache_page(mapping, index,
-			(filler_t*)mapping->a_ops->readpage, NULL);
-
-	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
-		kmap(page);
-		if (PageUptodate(page) && !PageError(page))
-			return page;
-		ntfs_unmap_page(page);
-		return ERR_PTR(-EIO);
-	}
-	return page;
-}
-
 /* Declarations of functions and global variables. */
 
 /* From fs/ntfs/compress.c */
 extern int ntfs_read_compressed_block(struct page *page);
+extern int allocate_compression_buffers(void);
+extern void free_compression_buffers(void);
 
 /* From fs/ntfs/super.c */
 #define default_upcase_len 0x10000
@@ -165,10 +96,6 @@
 	char *str;
 } option_t;
 extern const option_t on_errors_arr[];
-
-/* From fs/ntfs/compress.c */
-extern int allocate_compression_buffers(void);
-extern void free_compression_buffers(void);
 
 /* From fs/ntfs/mst.c */
 extern int post_read_mst_fixup(NTFS_RECORD *b, const u32 size);
diff -Nru a/fs/ntfs/quota.c b/fs/ntfs/quota.c
--- a/fs/ntfs/quota.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/quota.c	2004-10-19 10:14:07 +01:00
@@ -22,9 +22,10 @@
 
 #ifdef NTFS_RW
 
-#include "ntfs.h"
 #include "index.h"
 #include "quota.h"
+#include "debug.h"
+#include "ntfs.h"
 
 /**
  * ntfs_mark_quotas_out_of_date - mark the quotas out of date on an ntfs volume
diff -Nru a/fs/ntfs/runlist.c b/fs/ntfs/runlist.c
--- a/fs/ntfs/runlist.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/runlist.c	2004-10-19 10:14:07 +01:00
@@ -20,8 +20,10 @@
  * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
 
-#include "ntfs.h"
 #include "dir.h"
+#include "debug.h"
+#include "malloc.h"
+#include "ntfs.h"
 
 /**
  * ntfs_rl_mm - runlist memmove
diff -Nru a/fs/ntfs/runlist.h b/fs/ntfs/runlist.h
--- a/fs/ntfs/runlist.h	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/runlist.h	2004-10-19 10:14:07 +01:00
@@ -28,6 +28,34 @@
 #include "layout.h"
 #include "volume.h"
 
+/**
+ * runlist_element - in memory vcn to lcn mapping array element
+ * @vcn:	starting vcn of the current array element
+ * @lcn:	starting lcn of the current array element
+ * @length:	length in clusters of the current array element
+ *
+ * The last vcn (in fact the last vcn + 1) is reached when length == 0.
+ *
+ * When lcn == -1 this means that the count vcns starting at vcn are not
+ * physically allocated (i.e. this is a hole / data is sparse).
+ */
+typedef struct {	/* In memory vcn to lcn mapping structure element. */
+	VCN vcn;	/* vcn = Starting virtual cluster number. */
+	LCN lcn;	/* lcn = Starting logical cluster number. */
+	s64 length;	/* Run length in clusters. */
+} runlist_element;
+
+/**
+ * runlist - in memory vcn to lcn mapping array including a read/write lock
+ * @rl:		pointer to an array of runlist elements
+ * @lock:	read/write spinlock for serializing access to @rl
+ *
+ */
+typedef struct {
+	runlist_element *rl;
+	struct rw_semaphore lock;
+} runlist;
+
 static inline void ntfs_init_runlist(runlist *rl)
 {
 	rl->rl = NULL;
diff -Nru a/fs/ntfs/super.c b/fs/ntfs/super.c
--- a/fs/ntfs/super.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/super.c	2004-10-19 10:14:07 +01:00
@@ -31,12 +31,15 @@
 #include <linux/moduleparam.h>
 #include <linux/smp_lock.h>
 
-#include "ntfs.h"
 #include "sysctl.h"
 #include "logfile.h"
 #include "quota.h"
 #include "dir.h"
+#include "debug.h"
 #include "index.h"
+#include "aops.h"
+#include "malloc.h"
+#include "ntfs.h"
 
 /* Number of mounted file systems which have compression enabled. */
 static unsigned long ntfs_nr_compression_users;
diff -Nru a/fs/ntfs/types.h b/fs/ntfs/types.h
--- a/fs/ntfs/types.h	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/types.h	2004-10-19 10:14:07 +01:00
@@ -53,34 +53,6 @@
 typedef s64 LSN;
 typedef sle64 leLSN;
 
-/**
- * runlist_element - in memory vcn to lcn mapping array element
- * @vcn:	starting vcn of the current array element
- * @lcn:	starting lcn of the current array element
- * @length:	length in clusters of the current array element
- *
- * The last vcn (in fact the last vcn + 1) is reached when length == 0.
- *
- * When lcn == -1 this means that the count vcns starting at vcn are not
- * physically allocated (i.e. this is a hole / data is sparse).
- */
-typedef struct {	/* In memory vcn to lcn mapping structure element. */
-	VCN vcn;	/* vcn = Starting virtual cluster number. */
-	LCN lcn;	/* lcn = Starting logical cluster number. */
-	s64 length;	/* Run length in clusters. */
-} runlist_element;
-
-/**
- * runlist - in memory vcn to lcn mapping array including a read/write lock
- * @rl:		pointer to an array of runlist elements
- * @lock:	read/write spinlock for serializing access to @rl
- *
- */
-typedef struct {
-	runlist_element *rl;
-	struct rw_semaphore lock;
-} runlist;
-
 typedef enum {
 	FALSE = 0,
 	TRUE = 1
diff -Nru a/fs/ntfs/unistr.c b/fs/ntfs/unistr.c
--- a/fs/ntfs/unistr.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/unistr.c	2004-10-19 10:14:07 +01:00
@@ -19,6 +19,8 @@
  * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
 
+#include "types.h"
+#include "debug.h"
 #include "ntfs.h"
 
 /*
diff -Nru a/fs/ntfs/upcase.c b/fs/ntfs/upcase.c
--- a/fs/ntfs/upcase.c	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/upcase.c	2004-10-19 10:14:07 +01:00
@@ -24,6 +24,7 @@
  * Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
 
+#include "malloc.h"
 #include "ntfs.h"
 
 ntfschar *generate_default_upcase(void)
diff -Nru a/fs/ntfs/volume.h b/fs/ntfs/volume.h
--- a/fs/ntfs/volume.h	2004-10-19 10:14:07 +01:00
+++ b/fs/ntfs/volume.h	2004-10-19 10:14:07 +01:00
@@ -24,6 +24,8 @@
 #ifndef _LINUX_NTFS_VOLUME_H
 #define _LINUX_NTFS_VOLUME_H
 
+#include <linux/rwsem.h>
+
 #include "types.h"
 #include "layout.h"
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 18/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:42                                 ` [PATCH 17/37] " Anton Altaparmakov
@ 2004-10-19  9:43                                   ` Anton Altaparmakov
  2004-10-19  9:43                                     ` [PATCH 19/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 18/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/07 1.2030)
   NTFS: Remove unused {__,}format_mft_record() from fs/ntfs/mft.c.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:11 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:11 +01:00
@@ -58,6 +58,7 @@
 	  include errors.
 	- Move the typedefs for runlist_element and runlist from types.h to
 	  runlist.h and fix resulting include errors.
+	- Remove unused {__,}format_mft_record() from fs/ntfs/mft.c.
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:14:11 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:14:11 +01:00
@@ -32,79 +32,6 @@
 #include "ntfs.h"
 
 /**
- * __format_mft_record - initialize an empty mft record
- * @m:		mapped, pinned and locked for writing mft record
- * @size:	size of the mft record
- * @rec_no:	mft record number / inode number
- *
- * Private function to initialize an empty mft record. Use one of the two
- * provided format_mft_record() functions instead.
- */
-static void __format_mft_record(MFT_RECORD *m, const int size,
-		const unsigned long rec_no)
-{
-	ATTR_RECORD *a;
-
-	memset(m, 0, size);
-	m->magic = magic_FILE;
-	/* Aligned to 2-byte boundary. */
-	m->usa_ofs = cpu_to_le16((sizeof(MFT_RECORD) + 1) & ~1);
-	m->usa_count = cpu_to_le16(size / NTFS_BLOCK_SIZE + 1);
-	/* Set the update sequence number to 1. */
-	*(le16*)((char*)m + ((sizeof(MFT_RECORD) + 1) & ~1)) = cpu_to_le16(1);
-	m->lsn = cpu_to_le64(0LL);
-	m->sequence_number = cpu_to_le16(1);
-	m->link_count = 0;
-	/* Aligned to 8-byte boundary. */
-	m->attrs_offset = cpu_to_le16((le16_to_cpu(m->usa_ofs) +
-			(le16_to_cpu(m->usa_count) << 1) + 7) & ~7);
-	m->flags = 0;
-	/*
-	 * Using attrs_offset plus eight bytes (for the termination attribute),
-	 * aligned to 8-byte boundary.
-	 */
-	m->bytes_in_use = cpu_to_le32((le16_to_cpu(m->attrs_offset) + 8 + 7) &
-			~7);
-	m->bytes_allocated = cpu_to_le32(size);
-	m->base_mft_record = cpu_to_le64((MFT_REF)0);
-	m->next_attr_instance = 0;
-	a = (ATTR_RECORD*)((char*)m + le16_to_cpu(m->attrs_offset));
-	a->type = AT_END;
-	a->length = 0;
-}
-
-/**
- * format_mft_record - initialize an empty mft record
- * @ni:		ntfs inode of mft record
- * @mft_rec:	mapped, pinned and locked mft record (optional)
- *
- * Initialize an empty mft record. This is used when extending the MFT.
- *
- * If @mft_rec is NULL, we call map_mft_record() to obtain the
- * record and we unmap it again when finished.
- *
- * We return 0 on success or -errno on error.
- */
-int format_mft_record(ntfs_inode *ni, MFT_RECORD *mft_rec)
-{
-	MFT_RECORD *m;
-
-	if (mft_rec)
-		m = mft_rec;
-	else {
-		m = map_mft_record(ni);
-		if (IS_ERR(m))
-			return PTR_ERR(m);
-	}
-	__format_mft_record(m, ni->vol->mft_record_size, ni->mft_no);
-	if (!mft_rec) {
-		// FIXME: Need to set the mft record dirty!
-		unmap_mft_record(ni);
-	}
-	return 0;
-}
-
-/**
  * ntfs_readpage - external declaration, function is in fs/ntfs/aops.c
  */
 extern int ntfs_readpage(struct file *, struct page *);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 19/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:43                                   ` [PATCH 18/37] " Anton Altaparmakov
@ 2004-10-19  9:43                                     ` Anton Altaparmakov
  2004-10-19  9:43                                       ` [PATCH 20/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 19/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/07 1.2031)
   NTFS: - Modify fs/ntfs/mft.c::__mark_mft_record_dirty() to use the helper
           mark_ntfs_record_dirty() which also changes the behaviour in that we
           now set the buffers belonging to the mft record dirty as well as the
           page itself.
         - Update fs/ntfs/mft.c::write_mft_record_nolock() and sync_mft_mirror()
           to cope with the fact that there now are dirty buffers in mft pages.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:15 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:15 +01:00
@@ -59,6 +59,12 @@
 	- Move the typedefs for runlist_element and runlist from types.h to
 	  runlist.h and fix resulting include errors.
 	- Remove unused {__,}format_mft_record() from fs/ntfs/mft.c.
+	- Modify fs/ntfs/mft.c::__mark_mft_record_dirty() to use the helper
+	  mark_ntfs_record_dirty() which also changes the behaviour in that we
+	  now set the buffers belonging to the mft record dirty as well as the
+	  page itself.
+	- Update fs/ntfs/mft.c::write_mft_record_nolock() and sync_mft_mirror()
+	  to cope with the fact that there now are dirty buffers in mft pages.
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:14:15 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:14:15 +01:00
@@ -407,19 +407,11 @@
  */
 void __mark_mft_record_dirty(ntfs_inode *ni)
 {
-	struct page *page = ni->page;
 	ntfs_inode *base_ni;
 
 	ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
-	BUG_ON(!page);
 	BUG_ON(NInoAttr(ni));
-
-	/*
-	 * Set the page containing the mft record dirty.  This also marks the
-	 * $MFT inode dirty (I_DIRTY_PAGES).
-	 */
-	__set_page_dirty_nobuffers(page);
-
+	mark_ntfs_record_dirty(ni, ni->page, ni->page_ofs);
 	/* Determine the base vfs inode and mark it dirty, too. */
 	down(&ni->extent_lock);
 	if (likely(ni->nr_extents >= 0))
@@ -541,20 +533,9 @@
 	m_end = m_start + vol->mft_record_size;
 	do {
 		block_end = block_start + blocksize;
-		/*
-		 * If the buffer is outside the mft record, just skip it,
-		 * clearing it if it is dirty to make sure it is not written
-		 * out.  It should never be marked dirty but better be safe.
-		 */
-		if ((block_end <= m_start) || (block_start >= m_end)) {
-			if (buffer_dirty(bh)) {
-				ntfs_warning(vol->sb, "Clearing dirty mft "
-						"record page buffer.  %s",
-						ntfs_please_email);
-				clear_buffer_dirty(bh);
-			}
+		/* If the buffer is outside the mft record, skip it. */
+		if ((block_end <= m_start) || (block_start >= m_end))
 			continue;
-		}
 		if (!buffer_mapped(bh)) {
 			ntfs_error(vol->sb, "Writing mft mirror records "
 					"without existing mapped buffers is "
@@ -706,20 +687,9 @@
 	m_end = m_start + vol->mft_record_size;
 	do {
 		block_end = block_start + blocksize;
-		/*
-		 * If the buffer is outside the mft record, just skip it,
-		 * clearing it if it is dirty to make sure it is not written
-		 * out.  It should never be marked dirty but better be safe.
-		 */
-		if ((block_end <= m_start) || (block_start >= m_end)) {
-			if (buffer_dirty(bh)) {
-				ntfs_warning(vol->sb, "Clearing dirty mft "
-						"record page buffer.  %s",
-						ntfs_please_email);
-				clear_buffer_dirty(bh);
-			}
+		/* If the buffer is outside the mft record, skip it. */
+		if ((block_end <= m_start) || (block_start >= m_end))
 			continue;
-		}
 		if (!buffer_mapped(bh)) {
 			ntfs_error(vol->sb, "Writing mft records without "
 					"existing mapped buffers is not "

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 20/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:43                                     ` [PATCH 19/37] " Anton Altaparmakov
@ 2004-10-19  9:43                                       ` Anton Altaparmakov
  2004-10-19  9:43                                         ` [PATCH 21/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 20/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/10 1.2032.1.1)
   NTFS: Update fs/ntfs/inode.c::ntfs_write_inode() to also use the helper
         mark_ntfs_record_dirty() and thus to set the buffers belonging to the
         mft record dirty as well as the page itself.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:18 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:18 +01:00
@@ -65,6 +65,9 @@
 	  page itself.
 	- Update fs/ntfs/mft.c::write_mft_record_nolock() and sync_mft_mirror()
 	  to cope with the fact that there now are dirty buffers in mft pages.
+	- Update fs/ntfs/inode.c::ntfs_write_inode() to also use the helper
+	  mark_ntfs_record_dirty() and thus to set the buffers belonging to the
+	  mft record dirty as well as the page itself.
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/inode.c b/fs/ntfs/inode.c
--- a/fs/ntfs/inode.c	2004-10-19 10:14:18 +01:00
+++ b/fs/ntfs/inode.c	2004-10-19 10:14:18 +01:00
@@ -2271,7 +2271,7 @@
  * ntfs_truncate - called when the i_size of an ntfs inode is changed
  * @vi:		inode for which the i_size was changed
  *
- * We don't support i_size changes yet.
+ * We do not support i_size changes yet.
  *
  * The kernel guarantees that @vi is a regular file (S_ISREG() is true) and
  * that the change is allowed.
@@ -2505,9 +2505,16 @@
 	 * dirty, since we are going to write this mft record below in any case
 	 * and the base mft record may actually not have been modified so it
 	 * might not need to be written out.
+	 * NOTE: It is not a problem when the inode for $MFT itself is being
+	 * written out as mark_ntfs_record_dirty() will only set I_DIRTY_PAGES
+	 * on the $MFT inode and hence ntfs_write_inode() will not be
+	 * re-invoked because of it which in turn is ok since the dirtied mft
+	 * record will be cleaned and written out to disk below, i.e. before
+	 * this function returns.
 	 */
 	if (modified && !NInoTestSetDirty(ctx->ntfs_ino))
-		__set_page_dirty_nobuffers(ctx->ntfs_ino->page);
+		mark_ntfs_record_dirty(ctx->ntfs_ino, ctx->ntfs_ino->page,
+				ctx->ntfs_ino->page_ofs);
 	ntfs_attr_put_search_ctx(ctx);
 	/* Now the access times are updated, write the base mft record. */
 	if (NInoDirty(ni))

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 21/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:43                                       ` [PATCH 20/37] " Anton Altaparmakov
@ 2004-10-19  9:43                                         ` Anton Altaparmakov
  2004-10-19  9:44                                           ` [PATCH 22/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 21/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/10 1.2032.1.2)
   NTFS: Fix warnings on x86-64.  (Randy Dunlap with slight modification from me)
   
   Fix printk arg type warnings on x86-64 (and OK on x86-32) (gcc 3.3.3):
   fs/ntfs/dir.c:1272: warning: long long unsigned int format, long unsigned int arg (arg 6)
   fs/ntfs/dir.c:1388: warning: long long unsigned int format, long unsigned int arg (arg 5
   
   Signed-off-by: Randy Dunlap <rddunlap@osdl.org>
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:22 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:22 +01:00
@@ -68,6 +68,8 @@
 	- Update fs/ntfs/inode.c::ntfs_write_inode() to also use the helper
 	  mark_ntfs_record_dirty() and thus to set the buffers belonging to the
 	  mft record dirty as well as the page itself.
+	- Fix compiler warnings on x86-64 in fs/ntfs/dir.c.  (Randy Dunlap,
+	  slightly modified by me)
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/dir.c b/fs/ntfs/dir.c
--- a/fs/ntfs/dir.c	2004-10-19 10:14:22 +01:00
+++ b/fs/ntfs/dir.c	2004-10-19 10:14:22 +01:00
@@ -1278,7 +1278,7 @@
 	ntfs_debug("Reading bitmap with page index 0x%llx, bit ofs 0x%llx",
 			(unsigned long long)bmp_pos >> (3 + PAGE_CACHE_SHIFT),
 			(unsigned long long)bmp_pos &
-			((PAGE_CACHE_SIZE * 8) - 1));
+			(unsigned long long)((PAGE_CACHE_SIZE * 8) - 1));
 	bmp_page = ntfs_map_page(bmp_mapping,
 			bmp_pos >> (3 + PAGE_CACHE_SHIFT));
 	if (IS_ERR(bmp_page)) {
@@ -1392,8 +1392,8 @@
 	 */
 	for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
 		ntfs_debug("In index allocation, offset 0x%llx.",
-				(unsigned long long)ia_start + ((u8*)ie -
-				(u8*)ia));
+				(unsigned long long)ia_start +
+				(unsigned long long)((u8*)ie - (u8*)ia));
 		/* Bounds checks. */
 		if (unlikely((u8*)ie < (u8*)ia || (u8*)ie +
 				sizeof(INDEX_ENTRY_HEADER) > index_end ||
diff -Nru a/fs/ntfs/inode.c b/fs/ntfs/inode.c
--- a/fs/ntfs/inode.c	2004-10-19 10:14:22 +01:00
+++ b/fs/ntfs/inode.c	2004-10-19 10:14:22 +01:00
@@ -25,7 +25,7 @@
 #include <linux/quotaops.h>
 #include <linux/mount.h>
 
-#include "ntfs.h"
+#include "aops.h"
 #include "dir.h"
 #include "debug.h"
 #include "inode.h"
@@ -33,6 +33,7 @@
 #include "malloc.h"
 #include "mft.h"
 #include "time.h"
+#include "ntfs.h"
 
 /**
  * ntfs_test_inode - compare two (possibly fake) inodes for equality

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 22/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:43                                         ` [PATCH 21/37] " Anton Altaparmakov
@ 2004-10-19  9:44                                           ` Anton Altaparmakov
  2004-10-19  9:44                                             ` [PATCH 23/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 22/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/11 1.2041)
   NTFS: Add fs/ntfs/mft.c::try_map_mft_record() which fails with -EALREADY if
         the mft record is already locked and otherwise behaves the same way
         as fs/ntfs/mft.c::map_mft_record().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:25 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:25 +01:00
@@ -70,6 +70,9 @@
 	  mft record dirty as well as the page itself.
 	- Fix compiler warnings on x86-64 in fs/ntfs/dir.c.  (Randy Dunlap,
 	  slightly modified by me)
+	- Add fs/ntfs/mft.c::try_map_mft_record() which fails with -EALREADY if
+	  the mft record is already locked and otherwise behaves the same way
+	  as fs/ntfs/mft.c::map_mft_record().
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:14:25 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:14:25 +01:00
@@ -115,6 +115,57 @@
 }
 
 /**
+ * try_map_mft_record - attempt to map, pin and lock an mft record
+ * @ni:		ntfs inode whose MFT record to map
+ *
+ * First, attempt to take the mrec_lock semaphore.  If the semaphore is already
+ * taken by someone else, return the error code -EALREADY.  Otherwise continue
+ * as described below.
+ *
+ * The page of the record is mapped using map_mft_record_page() before being
+ * returned to the caller.
+ *
+ * This in turn uses ntfs_map_page() to get the page containing the wanted mft
+ * record (it in turn calls read_cache_page() which reads it in from disk if
+ * necessary, increments the use count on the page so that it cannot disappear
+ * under us and returns a reference to the page cache page).
+ *
+ * The mft record is now ours and we return a pointer to it.  You need to check
+ * the returned pointer with IS_ERR() and if that is true, PTR_ERR() will return
+ * the error code.
+ *
+ * For further details see the description of map_mft_record() below.
+ */
+MFT_RECORD *try_map_mft_record(ntfs_inode *ni)
+{
+	MFT_RECORD *m;
+
+	ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no);
+
+	/* Make sure the ntfs inode doesn't go away. */
+	atomic_inc(&ni->count);
+
+	/*
+	 * Serialize access to this mft record.  If someone else is already
+	 * holding the lock, abort instead of waiting for the lock.
+	 */
+	if (unlikely(down_trylock(&ni->mrec_lock))) {
+		ntfs_debug("Mft record is already locked, aborting.");
+		atomic_dec(&ni->count);
+		return ERR_PTR(-EALREADY);
+	}
+
+	m = map_mft_record_page(ni);
+	if (likely(!IS_ERR(m)))
+		return m;
+
+	up(&ni->mrec_lock);
+	atomic_dec(&ni->count);
+	ntfs_error(ni->vol->sb, "Failed with error code %lu.", -PTR_ERR(m));
+	return m;
+}
+
+/**
  * map_mft_record - map, pin and lock an mft record
  * @ni:		ntfs inode whose MFT record to map
  *
diff -Nru a/fs/ntfs/mft.h b/fs/ntfs/mft.h
--- a/fs/ntfs/mft.h	2004-10-19 10:14:25 +01:00
+++ b/fs/ntfs/mft.h	2004-10-19 10:14:25 +01:00
@@ -29,6 +29,7 @@
 
 #include "inode.h"
 
+extern MFT_RECORD *try_map_mft_record(ntfs_inode *ni);
 extern MFT_RECORD *map_mft_record(ntfs_inode *ni);
 extern void unmap_mft_record(ntfs_inode *ni);
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 23/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:44                                           ` [PATCH 22/37] " Anton Altaparmakov
@ 2004-10-19  9:44                                             ` Anton Altaparmakov
  2004-10-19  9:44                                               ` [PATCH 24/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 23/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/11 1.2041.1.1)
   NTFS: Modify fs/ntfs/mft.c::write_mft_record_nolock() so that it only
         writes the mft record if the buffers belonging to it are dirty.
         Otherwise we assume that it was written out by other means already.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:29 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:29 +01:00
@@ -73,6 +73,9 @@
 	- Add fs/ntfs/mft.c::try_map_mft_record() which fails with -EALREADY if
 	  the mft record is already locked and otherwise behaves the same way
 	  as fs/ntfs/mft.c::map_mft_record().
+	- Modify fs/ntfs/mft.c::write_mft_record_nolock() so that it only
+	  writes the mft record if the buffers belonging to it are dirty.
+	  Otherwise we assume that it was written out by other means already.
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:14:29 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:14:29 +01:00
@@ -678,6 +678,9 @@
  * ntfs inode @ni to backing store.  If the mft record @m has a counterpart in
  * the mft mirror, that is also updated.
  *
+ * We only write the mft record if the ntfs inode @ni is dirty and the buffers
+ * belonging to its mft record are dirty, too.
+ *
  * On success, clean the mft record and return 0.  On error, leave the mft
  * record dirty and return -errno.  The caller should call make_bad_inode() on
  * the base inode to ensure no more access happens to this inode.  We do not do
@@ -707,6 +710,7 @@
 	struct buffer_head *bh, *head;
 	unsigned int block_start, block_end, m_start, m_end;
 	int i_bhs, nr_bhs, err = 0;
+	BOOL rec_is_dirty = TRUE;
 
 	ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
 	BUG_ON(NInoAttr(ni));
@@ -739,29 +743,34 @@
 	do {
 		block_end = block_start + blocksize;
 		/* If the buffer is outside the mft record, skip it. */
-		if ((block_end <= m_start) || (block_start >= m_end))
-			continue;
-		if (!buffer_mapped(bh)) {
-			ntfs_error(vol->sb, "Writing mft records without "
-					"existing mapped buffers is not "
-					"implemented yet.  %s",
-					ntfs_please_email);
-			err = -EOPNOTSUPP;
+		if (block_end <= m_start)
 			continue;
-		}
-		if (!buffer_uptodate(bh)) {
-			ntfs_error(vol->sb, "Writing mft records without "
-					"existing uptodate buffers is not "
-					"implemented yet.  %s",
-					ntfs_please_email);
-			err = -EOPNOTSUPP;
+		if (unlikely(block_start >= m_end))
+			break;
+		/*
+		 * If the buffer is clean and it is the first buffer of the mft
+		 * record, it was written out by other means already so we are
+		 * done.  For safety we make sure all the other buffers are
+		 * clean also.  If it is clean but not the first buffer and the
+		 * first buffer was dirty it is a bug.
+		 */
+		if (!buffer_dirty(bh)) {
+			if (block_start == m_start)
+				rec_is_dirty = FALSE;
+			else
+				BUG_ON(rec_is_dirty);
 			continue;
 		}
+		BUG_ON(!rec_is_dirty);
+		BUG_ON(!buffer_mapped(bh));
+		BUG_ON(!buffer_uptodate(bh));
 		BUG_ON(!nr_bhs && (m_start != block_start));
 		BUG_ON(nr_bhs >= max_bhs);
 		bhs[nr_bhs++] = bh;
 		BUG_ON((nr_bhs >= max_bhs) && (m_end != block_end));
 	} while (block_start = block_end, (bh = bh->b_this_page) != head);
+	if (!rec_is_dirty)
+		goto done;
 	if (unlikely(err))
 		goto cleanup_out;
 	/* Apply the mst protection fixups. */
@@ -778,8 +787,7 @@
 		if (unlikely(test_set_buffer_locked(tbh)))
 			BUG();
 		BUG_ON(!buffer_uptodate(tbh));
-		if (buffer_dirty(tbh))
-			clear_buffer_dirty(tbh);
+		clear_buffer_dirty(tbh);
 		get_bh(tbh);
 		tbh->b_end_io = end_buffer_write_sync;
 		submit_bh(WRITE, tbh);
@@ -795,8 +803,8 @@
 		if (unlikely(!buffer_uptodate(tbh))) {
 			err = -EIO;
 			/*
-			 * Set the buffer uptodate so the page & buffer states
-			 * don't become out of sync.
+			 * Set the buffer uptodate so the page and buffer
+			 * states do not become out of sync.
 			 */
 			if (PageUptodate(page))
 				set_buffer_uptodate(tbh);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 24/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:44                                             ` [PATCH 23/37] " Anton Altaparmakov
@ 2004-10-19  9:44                                               ` Anton Altaparmakov
  2004-10-19  9:44                                                 ` [PATCH 25/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 24/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/12 1.2041.1.2)
   NTFS: Attempting to write outside initialized size is _not_ a bug so remove
         the bug check from fs/ntfs/aops.c::ntfs_write_mst_block().  It is in
         fact required to write outside initialized size when preparing to
         extend the initialized size.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:33 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:33 +01:00
@@ -76,6 +76,10 @@
 	- Modify fs/ntfs/mft.c::write_mft_record_nolock() so that it only
 	  writes the mft record if the buffers belonging to it are dirty.
 	  Otherwise we assume that it was written out by other means already.
+	- Attempting to write outside initialized size is _not_ a bug so remove
+	  the bug check from fs/ntfs/aops.c::ntfs_write_mst_block().  It is in
+	  fact required to write outside initialized size when preparing to
+	  extend the initialized size.
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/aops.c b/fs/ntfs/aops.c
--- a/fs/ntfs/aops.c	2004-10-19 10:14:33 +01:00
+++ b/fs/ntfs/aops.c	2004-10-19 10:14:33 +01:00
@@ -892,8 +892,6 @@
 			}
 			BUG_ON(!rec_is_dirty);
 		}
-		/* Attempting to write outside the initialized size is a bug. */
-		BUG_ON(((block + 1) << bh_size_bits) > ni->initialized_size);
 		if (!buffer_mapped(bh)) {
 			ntfs_error(vol->sb, "Writing ntfs records without "
 					"existing mapped buffers is not "

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 25/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:44                                               ` [PATCH 24/37] " Anton Altaparmakov
@ 2004-10-19  9:44                                                 ` Anton Altaparmakov
  2004-10-19  9:45                                                   ` [PATCH 26/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 25/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/12 1.2041.1.3)
   NTFS: Map the page instead of using page_address() before writing to it in
         fs/ntfs/aops.c::ntfs_mft_writepage().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:36 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:36 +01:00
@@ -80,6 +80,8 @@
 	  the bug check from fs/ntfs/aops.c::ntfs_write_mst_block().  It is in
 	  fact required to write outside initialized size when preparing to
 	  extend the initialized size.
+	- Map the page instead of using page_address() before writing to it in
+	  fs/ntfs/aops.c::ntfs_mft_writepage().
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/aops.c b/fs/ntfs/aops.c
--- a/fs/ntfs/aops.c	2004-10-19 10:14:36 +01:00
+++ b/fs/ntfs/aops.c	2004-10-19 10:14:36 +01:00
@@ -917,7 +917,7 @@
 	if (!nr_bhs)
 		goto done;
 	/* Apply the mst protection fixups. */
-	kaddr = page_address(page);
+	kaddr = kmap(page);
 	for (i = 0; i < nr_bhs; i++) {
 		if (!(i % bhs_per_rec)) {
 			err = pre_write_mst_fixup((NTFS_RECORD*)(kaddr +
@@ -974,6 +974,7 @@
 					bh_offset(bhs[i])));
 	}
 	flush_dcache_page(page);
+	kunmap(page);
 	if (unlikely(err)) {
 		/* I/O error during writing.  This is really bad! */
 		ntfs_error(vol->sb, "I/O error while writing ntfs record "
@@ -996,6 +997,7 @@
 			post_write_mst_fixup((NTFS_RECORD*)(kaddr +
 					bh_offset(bhs[i])));
 	}
+	kunmap(page);
 cleanup_out:
 	/* Clean the buffers. */
 	for (i = 0; i < nr_bhs; i++)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 26/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:44                                                 ` [PATCH 25/37] " Anton Altaparmakov
@ 2004-10-19  9:45                                                   ` Anton Altaparmakov
  2004-10-19  9:45                                                     ` [PATCH 27/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 26/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/12 1.2041.1.4)
   NTFS: Provide exclusion between opening an inode / mapping an mft record
         and accessing the mft record in fs/ntfs/mft.c::ntfs_mft_writepage()
         by setting the page not uptodate throughout ntfs_mft_writepage().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:40 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:40 +01:00
@@ -82,6 +82,9 @@
 	  extend the initialized size.
 	- Map the page instead of using page_address() before writing to it in
 	  fs/ntfs/aops.c::ntfs_mft_writepage().
+	- Provide exclusion between opening an inode / mapping an mft record
+	  and accessing the mft record in fs/ntfs/mft.c::ntfs_mft_writepage()
+	  by setting the page not uptodate throughout ntfs_mft_writepage().
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:14:40 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:14:40 +01:00
@@ -724,18 +724,9 @@
 	 */
 	if (!NInoTestClearDirty(ni))
 		goto done;
-	/* Make sure we have mapped buffers. */
-	if (!page_has_buffers(page)) {
-no_buffers_err_out:
-		ntfs_error(vol->sb, "Writing mft records without existing "
-				"buffers is not implemented yet.  %s",
-				ntfs_please_email);
-		err = -EOPNOTSUPP;
-		goto err_out;
-	}
+	BUG_ON(!page_has_buffers(page));
 	bh = head = page_buffers(page);
-	if (!bh)
-		goto no_buffers_err_out;
+	BUG_ON(!bh);
 	nr_bhs = 0;
 	block_start = 0;
 	m_start = ni->page_ofs;
@@ -892,6 +883,12 @@
 	ntfs_debug("Entering for %i inodes starting at 0x%lx.", nr, mft_no);
 	/* Iterate over the mft records in the page looking for a dirty one. */
 	maddr = (u8*)kmap(page);
+	/*
+	 * Clear the page uptodate flag.  This will cause anyone trying to get
+	 * hold of the page to block on the page lock in read_cache_page().
+	 */
+	BUG_ON(!PageUptodate(page));
+	ClearPageUptodate(page);
 	for (i = 0; i < nr; ++i, ++mft_no, maddr += vol->mft_record_size) {
 		struct inode *vi;
 		ntfs_inode *ni, *eni;
@@ -1034,6 +1031,7 @@
 		up(&ni->extent_lock);
 		iput(vi);
 	}
+	SetPageUptodate(page);
 	kunmap(page);
 	/* If a dirty mft record was found, redirty the page. */
 	if (is_dirty) {

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 27/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:45                                                   ` [PATCH 26/37] " Anton Altaparmakov
@ 2004-10-19  9:45                                                     ` Anton Altaparmakov
  2004-10-19  9:45                                                       ` [PATCH 28/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 27/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/14 1.2046)
   NTFS: Big cleanup of mft record writing code.
   
   - Clear the page uptodate flag in fs/ntfs/aops.c::ntfs_write_mst_block()
     to ensure noone can see the page whilst the mst fixups are applied.
   - Add the helper fs/ntfs/mft.c::ntfs_may_write_mft_record() which
     checks if an mft record may be written out safely obtaining any
     necessary locks in the process.  This is used by
     fs/ntfs/aops.c::ntfs_write_mst_block().
   - Modify fs/ntfs/aops.c::ntfs_write_mst_block() to also work for
     writing mft records and improve its error handling in the process.
     Now if any of the records in the page fail to be written out, all
     other records will be written out instead of aborting completely.
   - Remove ntfs_mft_aops and update all users to use ntfs_mst_aops.
   - Modify fs/ntfs/inode.c::ntfs_read_locked_inode() to set the
     ntfs_mst_aops for all inodes which are NInoMstProtected() and
     ntfs_aops for all other inodes.
   - Rename fs/ntfs/mft.c::sync_mft_mirror{,_umount}() to
     ntfs_sync_mft_mirror{,_umount}() and change their parameters so they
     no longer require an ntfs inode to be present.  Update all callers.
   - Cleanup the error handling in fs/ntfs/mft.c::ntfs_sync_mft_mirror().
   - Clear the page uptodate flag in fs/ntfs/mft.c::ntfs_sync_mft_mirror()
     to ensure noone can see the page whilst the mst fixups are applied.
   - Remove the no longer needed fs/ntfs/mft.c::ntfs_mft_writepage() and
     fs/ntfs/mft.c::try_map_mft_record().
   - Fix callers of fs/ntfs/aops.c::mark_ntfs_record_dirty() to call it
     with the ntfs inode which contains the page rather than the ntfs
     inode the mft record of which is in the page.
   
   Ooops.  Yes, I know, I should have split this up into smaller changes...
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:44 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:44 +01:00
@@ -85,6 +85,31 @@
 	- Provide exclusion between opening an inode / mapping an mft record
 	  and accessing the mft record in fs/ntfs/mft.c::ntfs_mft_writepage()
 	  by setting the page not uptodate throughout ntfs_mft_writepage().
+	- Clear the page uptodate flag in fs/ntfs/aops.c::ntfs_write_mst_block()
+	  to ensure noone can see the page whilst the mst fixups are applied.
+	- Add the helper fs/ntfs/mft.c::ntfs_may_write_mft_record() which
+	  checks if an mft record may be written out safely obtaining any
+	  necessary locks in the process.  This is used by
+	  fs/ntfs/aops.c::ntfs_write_mst_block().
+	- Modify fs/ntfs/aops.c::ntfs_write_mst_block() to also work for
+	  writing mft records and improve its error handling in the process.
+	  Now if any of the records in the page fail to be written out, all
+	  other records will be written out instead of aborting completely.
+	- Remove ntfs_mft_aops and update all users to use ntfs_mst_aops.
+	- Modify fs/ntfs/inode.c::ntfs_read_locked_inode() to set the
+	  ntfs_mst_aops for all inodes which are NInoMstProtected() and
+	  ntfs_aops for all other inodes.
+	- Rename fs/ntfs/mft.c::sync_mft_mirror{,_umount}() to
+	  ntfs_sync_mft_mirror{,_umount}() and change their parameters so they
+	  no longer require an ntfs inode to be present.  Update all callers.
+	- Cleanup the error handling in fs/ntfs/mft.c::ntfs_sync_mft_mirror().
+	- Clear the page uptodate flag in fs/ntfs/mft.c::ntfs_sync_mft_mirror()
+	  to ensure noone can see the page whilst the mst fixups are applied.
+	- Remove the no longer needed fs/ntfs/mft.c::ntfs_mft_writepage() and
+	  fs/ntfs/mft.c::try_map_mft_record().
+	- Fix callers of fs/ntfs/aops.c::mark_ntfs_record_dirty() to call it
+	  with the ntfs inode which contains the page rather than the ntfs
+	  inode the mft record of which is in the page.
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/aops.c b/fs/ntfs/aops.c
--- a/fs/ntfs/aops.c	2004-10-19 10:14:44 +01:00
+++ b/fs/ntfs/aops.c	2004-10-19 10:14:44 +01:00
@@ -26,6 +26,7 @@
 #include <linux/pagemap.h>
 #include <linux/swap.h>
 #include <linux/buffer_head.h>
+#include <linux/writeback.h>
 
 #include "aops.h"
 #include "debug.h"
@@ -777,25 +778,25 @@
 	return err;
 }
 
-static const char *ntfs_please_email = "Please email "
-		"linux-ntfs-dev@lists.sourceforge.net and say that you saw "
-		"this message.  Thank you.";
-
 /**
  * ntfs_write_mst_block - write a @page to the backing store
  * @wbc:	writeback control structure
  * @page:	page cache page to write out
  *
  * This function is for writing pages belonging to non-resident, mst protected
- * attributes to their backing store.  The only supported attribute is the
- * index allocation attribute.  Both directory inodes and index inodes are
- * supported.
+ * attributes to their backing store.  The only supported attributes are index
+ * allocation and $MFT/$DATA.  Both directory inodes and index inodes are
+ * supported for the index allocation case.
  *
  * The page must remain locked for the duration of the write because we apply
  * the mst fixups, write, and then undo the fixups, so if we were to unlock the
  * page before undoing the fixups, any other user of the page will see the
  * page contents as corrupt.
  *
+ * We clear the page uptodate flag for the duration of the function to ensure
+ * exclusion for the $MFT/$DATA case against someone mapping an mft record we
+ * are about to apply the mst fixups to.
+ *
  * Return 0 on success and -errno on error.
  *
  * Based on ntfs_write_block(), ntfs_mft_writepage(), and
@@ -810,60 +811,53 @@
 	ntfs_volume *vol = ni->vol;
 	u8 *kaddr;
 	unsigned int bh_size = 1 << vi->i_blkbits;
-	unsigned int rec_size;
-	struct buffer_head *bh, *head;
+	unsigned int rec_size = ni->itype.index.block_size;
+	ntfs_inode *locked_nis[PAGE_CACHE_SIZE / rec_size];
+	struct buffer_head *bh, *head, *tbh;
 	int max_bhs = PAGE_CACHE_SIZE / bh_size;
 	struct buffer_head *bhs[max_bhs];
-	int i, nr_recs, nr_bhs, bhs_per_rec, err;
-	unsigned char bh_size_bits;
-	BOOL rec_is_dirty;
+	int i, nr_locked_nis, nr_recs, nr_bhs, bhs_per_rec, err;
+	unsigned char bh_size_bits, rec_size_bits;
+	BOOL sync, is_mft, page_is_dirty, rec_is_dirty;
 
 	ntfs_debug("Entering for inode 0x%lx, attribute type 0x%x, page index "
 			"0x%lx.", vi->i_ino, ni->type, page->index);
 	BUG_ON(!NInoNonResident(ni));
 	BUG_ON(!NInoMstProtected(ni));
-	BUG_ON(!(S_ISDIR(vi->i_mode) ||
+	is_mft = (S_ISREG(vi->i_mode) && !vi->i_ino);
+	BUG_ON(!(is_mft || S_ISDIR(vi->i_mode) ||
 			(NInoAttr(ni) && ni->type == AT_INDEX_ALLOCATION)));
-	BUG_ON(PageWriteback(page));
-	BUG_ON(!PageUptodate(page));
 	BUG_ON(!max_bhs);
 
+	/* Were we called for sync purposes? */
+	sync = (wbc->sync_mode == WB_SYNC_ALL);
+
 	/* Make sure we have mapped buffers. */
-	if (unlikely(!page_has_buffers(page))) {
-no_buffers_err_out:
-		ntfs_error(vol->sb, "Writing ntfs records without existing "
-				"buffers is not implemented yet.  %s",
-				ntfs_please_email);
-		err = -EOPNOTSUPP;
-		goto err_out;
-	}
+	BUG_ON(!page_has_buffers(page));
 	bh = head = page_buffers(page);
-	if (unlikely(!bh))
-		goto no_buffers_err_out;
+	BUG_ON(!bh);
 
 	bh_size_bits = vi->i_blkbits;
-	rec_size = ni->itype.index.block_size;
-	nr_recs = PAGE_CACHE_SIZE / rec_size;
-	BUG_ON(!nr_recs);
+	rec_size_bits = ni->itype.index.block_size_bits;
+	BUG_ON(!(PAGE_CACHE_SIZE >> rec_size_bits));
 	bhs_per_rec = rec_size >> bh_size_bits;
 	BUG_ON(!bhs_per_rec);
 
 	/* The first block in the page. */
-	rec_block = block = (s64)page->index <<
+	rec_block = block = (sector_t)page->index <<
 			(PAGE_CACHE_SHIFT - bh_size_bits);
 
 	/* The first out of bounds block for the data size. */
 	dblock = (vi->i_size + bh_size - 1) >> bh_size_bits;
 
-	err = nr_bhs = 0;
-	/* Need this to silence a stupid gcc warning. */
-	rec_is_dirty = FALSE;
+	err = nr_bhs = nr_recs = nr_locked_nis = 0;
+	page_is_dirty = rec_is_dirty = FALSE;
 	do {
 		if (unlikely(block >= dblock)) {
 			/*
 			 * Mapped buffers outside i_size will occur, because
 			 * this page can be outside i_size when there is a
-			 * truncate in progress. The contents of such buffers
+			 * truncate in progress.  The contents of such buffers
 			 * were zeroed by ntfs_writepage().
 			 *
 			 * FIXME: What about the small race window where
@@ -876,7 +870,7 @@
 		}
 		if (rec_block == block) {
 			/* This block is the first one in the record. */
-			rec_block += rec_size >> bh_size_bits;
+			rec_block += bhs_per_rec;
 			if (!buffer_dirty(bh)) {
 				/* Clean buffers are not written out. */
 				rec_is_dirty = FALSE;
@@ -892,54 +886,91 @@
 			}
 			BUG_ON(!rec_is_dirty);
 		}
-		if (!buffer_mapped(bh)) {
-			ntfs_error(vol->sb, "Writing ntfs records without "
-					"existing mapped buffers is not "
-					"implemented yet.  %s",
-					ntfs_please_email);
-			clear_buffer_dirty(bh);
-			err = -EOPNOTSUPP;
-			goto cleanup_out;
-		}
-		if (!buffer_uptodate(bh)) {
-			ntfs_error(vol->sb, "Writing ntfs records without "
-					"existing uptodate buffers is not "
-					"implemented yet.  %s",
-					ntfs_please_email);
-			clear_buffer_dirty(bh);
-			err = -EOPNOTSUPP;
-			goto cleanup_out;
-		}
+		BUG_ON(!buffer_mapped(bh));
+		BUG_ON(!buffer_uptodate(bh));
 		bhs[nr_bhs++] = bh;
 		BUG_ON(nr_bhs > max_bhs);
 	} while (block++, (bh = bh->b_this_page) != head);
 	/* If there were no dirty buffers, we are done. */
 	if (!nr_bhs)
 		goto done;
-	/* Apply the mst protection fixups. */
+	/* Map the page so we can access its contents. */
 	kaddr = kmap(page);
+	/* Clear the page uptodate flag whilst the mst fixups are applied. */
+	BUG_ON(!PageUptodate(page));
+	ClearPageUptodate(page);
 	for (i = 0; i < nr_bhs; i++) {
-		if (!(i % bhs_per_rec)) {
-			err = pre_write_mst_fixup((NTFS_RECORD*)(kaddr +
-					bh_offset(bhs[i])), rec_size);
-			if (err) {
-				ntfs_error(vol->sb, "Failed to apply mst "
-						"fixups (inode 0x%lx, "
-						"attribute type 0x%x, page "
-						"index 0x%lx)!  Umount and "
-						"run chkdsk.", vi->i_ino,
-						ni->type,
-				page->index);
-				nr_bhs = i;
-				goto mst_cleanup_out;
+		unsigned int ofs;
+
+		/* Skip buffers which are not at the beginning of records. */
+		if (i % bhs_per_rec)
+			continue;
+		tbh = bhs[i];
+		ofs = bh_offset(tbh);
+		if (is_mft) {
+			ntfs_inode *tni;
+			unsigned long mft_no;
+
+			/* Get the mft record number. */
+			mft_no = (((s64)page->index << PAGE_CACHE_SHIFT) + ofs)
+					>> rec_size_bits;
+			/* Check whether to write this mft record. */
+			tni = NULL;
+			if (!ntfs_may_write_mft_record(vol, mft_no,
+					(MFT_RECORD*)(kaddr + ofs), &tni)) {
+				/*
+				 * The record should not be written.  This
+				 * means we need to redirty the page before
+				 * returning.
+				 */
+				page_is_dirty = TRUE;
+				/*
+				 * Remove the buffers in this mft record from
+				 * the list of buffers to write.
+				 */
+				do {
+					bhs[i] = NULL;
+				} while (++i % bhs_per_rec);
+				continue;
 			}
+			/*
+			 * The record should be written.  If a locked ntfs
+			 * inode was returned, add it to the array of locked
+			 * ntfs inodes.
+			 */
+			if (tni)
+				locked_nis[nr_locked_nis++] = tni;
+		}
+		/* Apply the mst protection fixups. */
+		err = pre_write_mst_fixup((NTFS_RECORD*)(kaddr + ofs),
+				rec_size);
+		if (unlikely(err)) {
+			ntfs_error(vol->sb, "Failed to apply mst fixups "
+					"(inode 0x%lx, attribute type 0x%x, "
+					"page index 0x%lx, page offset 0x%x)!"
+					"  Unmount and run chkdsk.", vi->i_ino,
+					ni->type, page->index, ofs);
+			/*
+			 * Mark all the buffers in this record clean as we do
+			 * not want to write corrupt data to disk.
+			 */
+			do {
+				clear_buffer_dirty(bhs[i]);
+				bhs[i] = NULL;
+			} while (++i % bhs_per_rec);
+			continue;
 		}
+		nr_recs++;
 	}
+	/* If no records are to be written out, we are done. */
+	if (!nr_recs)
+		goto unm_done;
 	flush_dcache_page(page);
 	/* Lock buffers and start synchronous write i/o on them. */
 	for (i = 0; i < nr_bhs; i++) {
-		struct buffer_head *tbh = bhs[i];
-
+		tbh = bhs[i];
+		if (!tbh)
+			continue;
 		if (unlikely(test_set_buffer_locked(tbh)))
 			BUG();
 		if (unlikely(!test_clear_buffer_dirty(tbh))) {
@@ -952,59 +983,121 @@
 		tbh->b_end_io = end_buffer_write_sync;
 		submit_bh(WRITE, tbh);
 	}
+	/* Synchronize the mft mirror now if not @sync. */
+	if (is_mft && !sync)
+		goto do_mirror;
+do_wait:
 	/* Wait on i/o completion of buffers. */
 	for (i = 0; i < nr_bhs; i++) {
-		struct buffer_head *tbh = bhs[i];
-
+		tbh = bhs[i];
+		if (!tbh)
+			continue;
 		wait_on_buffer(tbh);
 		if (unlikely(!buffer_uptodate(tbh))) {
+			ntfs_error(vol->sb, "I/O error while writing ntfs "
+					"record buffer (inode 0x%lx, "
+					"attribute type 0x%x, page index "
+					"0x%lx, page offset 0x%lx)!  Unmount "
+					"and run chkdsk.", vi->i_ino, ni->type,
+					page->index, bh_offset(tbh));
 			err = -EIO;
 			/*
-			 * Set the buffer uptodate so the page & buffer states
-			 * don't become out of sync.
+			 * Set the buffer uptodate so the page and buffer
+			 * states do not become out of sync.
 			 */
-			if (PageUptodate(page))
-				set_buffer_uptodate(tbh);
+			set_buffer_uptodate(tbh);
 		}
 	}
+	/* If @sync, now synchronize the mft mirror. */
+	if (is_mft && sync) {
+do_mirror:
+		for (i = 0; i < nr_bhs; i++) {
+			unsigned long mft_no;
+			unsigned int ofs;
+
+			/*
+			 * Skip buffers which are not at the beginning of
+			 * records.
+			 */
+			if (i % bhs_per_rec)
+				continue;
+			tbh = bhs[i];
+			/* Skip removed buffers (and hence records). */
+			if (!tbh)
+				continue;
+			ofs = bh_offset(tbh);
+			/* Get the mft record number. */
+			mft_no = (((s64)page->index << PAGE_CACHE_SHIFT) + ofs)
+					>> rec_size_bits;
+			if (mft_no < vol->mftmirr_size)
+				ntfs_sync_mft_mirror(vol, mft_no,
+						(MFT_RECORD*)(kaddr + ofs),
+						sync);
+		}
+		if (!sync)
+			goto do_wait;
+	}
 	/* Remove the mst protection fixups again. */
 	for (i = 0; i < nr_bhs; i++) {
-		if (!(i % bhs_per_rec))
+		if (!(i % bhs_per_rec)) {
+			tbh = bhs[i];
+			if (!tbh)
+				continue;
 			post_write_mst_fixup((NTFS_RECORD*)(kaddr +
-					bh_offset(bhs[i])));
+					bh_offset(tbh)));
+		}
 	}
 	flush_dcache_page(page);
-	kunmap(page);
+unm_done:
+	/* Unlock any locked inodes. */
+	while (nr_locked_nis-- > 0) {
+		ntfs_inode *tni, *base_tni;
+		
+		tni = locked_nis[nr_locked_nis];
+		/* Get the base inode. */
+		down(&tni->extent_lock);
+		if (tni->nr_extents >= 0)
+			base_tni = tni;
+		else {
+			base_tni = tni->ext.base_ntfs_ino;
+			BUG_ON(!base_tni);
+		}
+		up(&tni->extent_lock);
+		ntfs_debug("Unlocking %s inode 0x%lx.",
+				tni == base_tni ? "base" : "extent",
+				tni->mft_no);
+		up(&tni->mrec_lock);
+		atomic_dec(&tni->count);
+		iput(VFS_I(base_tni));
+	}
 	if (unlikely(err)) {
-		/* I/O error during writing.  This is really bad! */
-		ntfs_error(vol->sb, "I/O error while writing ntfs record "
-				"(inode 0x%lx, attribute type 0x%x, page "
-				"index 0x%lx)!  Umount and run chkdsk.",
-				vi->i_ino, ni->type, page->index);
-		goto err_out;
+		SetPageError(page);
+		NVolSetErrors(vol);
 	}
+	SetPageUptodate(page);
+	kunmap(page);
 done:
-	set_page_writeback(page);
-	unlock_page(page);
-	end_page_writeback(page);
-	if (!err)
+	if (page_is_dirty) {
+		ntfs_debug("Page still contains one or more dirty ntfs "
+				"records.  Redirtying the page starting at "
+				"record 0x%lx.", page->index <<
+				(PAGE_CACHE_SHIFT - rec_size_bits));
+		redirty_page_for_writepage(wbc, page);
+		unlock_page(page);
+	} else {
+		/*
+		 * Keep the VM happy.  This must be done otherwise the
+		 * radix-tree tag PAGECACHE_TAG_DIRTY remains set even though
+		 * the page is clean.
+		 */
+		BUG_ON(PageWriteback(page));
+		set_page_writeback(page);
+		unlock_page(page);
+		end_page_writeback(page);
+	}
+	if (likely(!err))
 		ntfs_debug("Done.");
 	return err;
-mst_cleanup_out:
-	/* Remove the mst protection fixups again. */
-	for (i = 0; i < nr_bhs; i++) {
-		if (!(i % bhs_per_rec))
-			post_write_mst_fixup((NTFS_RECORD*)(kaddr +
-					bh_offset(bhs[i])));
-	}
-	kunmap(page);
-cleanup_out:
-	/* Clean the buffers. */
-	for (i = 0; i < nr_bhs; i++)
-		clear_buffer_dirty(bhs[i]);
-err_out:
-	SetPageError(page);
-	goto done;
 }
 
 /**
@@ -1012,6 +1105,9 @@
  * @page:	page cache page to write out
  * @wbc:	writeback control structure
  *
+ * This is called from the VM when it wants to have a dirty ntfs page cache
+ * page cleaned.  The VM has already locked the page and marked it clean.
+ *
  * For non-resident attributes, ntfs_writepage() writes the @page by calling
  * the ntfs version of the generic block_write_full_page() function,
  * ntfs_write_block(), which in turn if necessary creates and writes the
@@ -1022,8 +1118,6 @@
  * The mft record is then marked dirty and written out asynchronously via the
  * vfs inode dirty code path.
  *
- * Note the caller clears the page dirty flag before calling ntfs_writepage().
- *
  * Based on ntfs_readpage() and fs/buffer.c::block_write_full_page().
  *
  * Return 0 on success and -errno on error.
@@ -2038,7 +2132,7 @@
 
 /**
  * mark_ntfs_record_dirty - mark an ntfs record dirty
- * @ni:		ntfs inode to which the ntfs record to be marked dirty belongs
+ * @ni:		ntfs inode containing the ntfs record to be marked dirty
  * @page:	page containing the ntfs record to mark dirty
  * @rec_start:	byte offset within @page at which the ntfs record begins
  *
diff -Nru a/fs/ntfs/inode.c b/fs/ntfs/inode.c
--- a/fs/ntfs/inode.c	2004-10-19 10:14:44 +01:00
+++ b/fs/ntfs/inode.c	2004-10-19 10:14:44 +01:00
@@ -968,7 +968,6 @@
 		/* Setup the operations for this inode. */
 		vi->i_op = &ntfs_dir_inode_ops;
 		vi->i_fop = &ntfs_dir_ops;
-		vi->i_mapping->a_ops = &ntfs_mst_aops;
 	} else {
 		/* It is a file. */
 		ntfs_attr_reinit_search_ctx(ctx);
@@ -1112,8 +1111,11 @@
 		/* Setup the operations for this inode. */
 		vi->i_op = &ntfs_file_inode_ops;
 		vi->i_fop = &ntfs_file_ops;
-		vi->i_mapping->a_ops = &ntfs_aops;
 	}
+	if (NInoMstProtected(ni))
+		vi->i_mapping->a_ops = &ntfs_mst_aops;
+	else
+		vi->i_mapping->a_ops = &ntfs_aops;
 	/*
 	 * The number of 512-byte blocks used on disk (for stat). This is in so
 	 * far inaccurate as it doesn't account for any named streams or other
@@ -1766,7 +1768,7 @@
 	vi->i_generation = ni->seq_no = le16_to_cpu(m->sequence_number);
 
 	/* Provides readpage() and sync_page() for map_mft_record(). */
-	vi->i_mapping->a_ops = &ntfs_mft_aops;
+	vi->i_mapping->a_ops = &ntfs_mst_aops;
 
 	ctx = ntfs_attr_get_search_ctx(ni, m);
 	if (!ctx) {
@@ -2028,8 +2030,6 @@
 			/* No VFS initiated operations allowed for $MFT. */
 			vi->i_op = &ntfs_empty_inode_ops;
 			vi->i_fop = &ntfs_empty_file_ops;
-			/* Put back our special address space operations. */
-			vi->i_mapping->a_ops = &ntfs_mft_aops;
 		}
 
 		/* Get the lowest vcn for the next extent. */
@@ -2514,8 +2514,8 @@
 	 * this function returns.
 	 */
 	if (modified && !NInoTestSetDirty(ctx->ntfs_ino))
-		mark_ntfs_record_dirty(ctx->ntfs_ino, ctx->ntfs_ino->page,
-				ctx->ntfs_ino->page_ofs);
+		mark_ntfs_record_dirty(NTFS_I(ni->vol->mft_ino),
+				ctx->ntfs_ino->page, ctx->ntfs_ino->page_ofs);
 	ntfs_attr_put_search_ctx(ctx);
 	/* Now the access times are updated, write the base mft record. */
 	if (NInoDirty(ni))
diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:14:44 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:14:44 +01:00
@@ -32,37 +32,6 @@
 #include "ntfs.h"
 
 /**
- * ntfs_readpage - external declaration, function is in fs/ntfs/aops.c
- */
-extern int ntfs_readpage(struct file *, struct page *);
-
-#ifdef NTFS_RW
-/**
- * ntfs_mft_writepage - forward declaration, function is further below
- */
-static int ntfs_mft_writepage(struct page *page, struct writeback_control *wbc);
-#endif /* NTFS_RW */
-
-/**
- * ntfs_mft_aops - address space operations for access to $MFT
- *
- * Address space operations for access to $MFT. This allows us to simply use
- * ntfs_map_page() in map_mft_record_page().
- */
-struct address_space_operations ntfs_mft_aops = {
-	.readpage	= ntfs_readpage,	/* Fill page with data. */
-	.sync_page	= block_sync_page,	/* Currently, just unplugs the
-						   disk request queue. */
-#ifdef NTFS_RW
-	.writepage	= ntfs_mft_writepage,	/* Write out the dirty mft
-						   records in a page. */
-	.set_page_dirty	= __set_page_dirty_nobuffers,	/* Set the page dirty
-						   without touching the buffers
-						   belonging to the page. */
-#endif /* NTFS_RW */
-};
-
-/**
  * map_mft_record_page - map the page in which a specific mft record resides
  * @ni:		ntfs inode whose mft record page to map
  *
@@ -115,57 +84,6 @@
 }
 
 /**
- * try_map_mft_record - attempt to map, pin and lock an mft record
- * @ni:		ntfs inode whose MFT record to map
- *
- * First, attempt to take the mrec_lock semaphore.  If the semaphore is already
- * taken by someone else, return the error code -EALREADY.  Otherwise continue
- * as described below.
- *
- * The page of the record is mapped using map_mft_record_page() before being
- * returned to the caller.
- *
- * This in turn uses ntfs_map_page() to get the page containing the wanted mft
- * record (it in turn calls read_cache_page() which reads it in from disk if
- * necessary, increments the use count on the page so that it cannot disappear
- * under us and returns a reference to the page cache page).
- *
- * The mft record is now ours and we return a pointer to it.  You need to check
- * the returned pointer with IS_ERR() and if that is true, PTR_ERR() will return
- * the error code.
- *
- * For further details see the description of map_mft_record() below.
- */
-MFT_RECORD *try_map_mft_record(ntfs_inode *ni)
-{
-	MFT_RECORD *m;
-
-	ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no);
-
-	/* Make sure the ntfs inode doesn't go away. */
-	atomic_inc(&ni->count);
-
-	/*
-	 * Serialize access to this mft record.  If someone else is already
-	 * holding the lock, abort instead of waiting for the lock.
-	 */
-	if (unlikely(down_trylock(&ni->mrec_lock))) {
-		ntfs_debug("Mft record is already locked, aborting.");
-		atomic_dec(&ni->count);
-		return ERR_PTR(-EALREADY);
-	}
-
-	m = map_mft_record_page(ni);
-	if (likely(!IS_ERR(m)))
-		return m;
-
-	up(&ni->mrec_lock);
-	atomic_dec(&ni->count);
-	ntfs_error(ni->vol->sb, "Failed with error code %lu.", -PTR_ERR(m));
-	return m;
-}
-
-/**
  * map_mft_record - map, pin and lock an mft record
  * @ni:		ntfs inode whose MFT record to map
  *
@@ -462,7 +380,8 @@
 
 	ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
 	BUG_ON(NInoAttr(ni));
-	mark_ntfs_record_dirty(ni, ni->page, ni->page_ofs);
+	mark_ntfs_record_dirty(NTFS_I(ni->vol->mft_ino), ni->page,
+			ni->page_ofs);
 	/* Determine the base vfs inode and mark it dirty, too. */
 	down(&ni->extent_lock);
 	if (likely(ni->nr_extents >= 0))
@@ -478,13 +397,14 @@
 		"this message.  Thank you.";
 
 /**
- * sync_mft_mirror_umount - synchronise an mft record to the mft mirror
- * @ni:		ntfs inode whose mft record to synchronize
+ * ntfs_sync_mft_mirror_umount - synchronise an mft record to the mft mirror
+ * @vol:	ntfs volume on which the mft record to synchronize resides
+ * @mft_no:	mft record number of mft record to synchronize
  * @m:		mapped, mst protected (extent) mft record to synchronize
  *
- * Write the mapped, mst protected (extent) mft record @m described by the
- * (regular or extent) ntfs inode @ni to the mft mirror ($MFTMirr) bypassing
- * the page cache and the $MFTMirr inode itself.
+ * Write the mapped, mst protected (extent) mft record @m with mft record
+ * number @mft_no to the mft mirror ($MFTMirr) of the ntfs volume @vol,
+ * bypassing the page cache and the $MFTMirr inode itself.
  *
  * This function is only for use at umount time when the mft mirror inode has
  * already been disposed off.  We BUG() if we are called while the mft mirror
@@ -498,10 +418,9 @@
  * alternative would be either to BUG() or to get a NULL pointer dereference
  * and Oops.
  */
-static int sync_mft_mirror_umount(ntfs_inode *ni, MFT_RECORD *m)
+static int ntfs_sync_mft_mirror_umount(ntfs_volume *vol,
+		const unsigned long mft_no, MFT_RECORD *m)
 {
-	ntfs_volume *vol = ni->vol;
-
 	BUG_ON(vol->mftmirr_ino);
 	ntfs_error(vol->sb, "Umount time mft mirror syncing is not "
 			"implemented yet.  %s", ntfs_please_email);
@@ -509,25 +428,26 @@
 }
 
 /**
- * sync_mft_mirror - synchronize an mft record to the mft mirror
- * @ni:		ntfs inode whose mft record to synchronize
+ * ntfs_sync_mft_mirror - synchronize an mft record to the mft mirror
+ * @vol:	ntfs volume on which the mft record to synchronize resides
+ * @mft_no:	mft record number of mft record to synchronize
  * @m:		mapped, mst protected (extent) mft record to synchronize
  * @sync:	if true, wait for i/o completion
  *
- * Write the mapped, mst protected (extent) mft record @m described by the
- * (regular or extent) ntfs inode @ni to the mft mirror ($MFTMirr).
+ * Write the mapped, mst protected (extent) mft record @m with mft record
+ * number @mft_no to the mft mirror ($MFTMirr) of the ntfs volume @vol.
  *
  * On success return 0.  On error return -errno and set the volume errors flag
- * in the ntfs_volume to which @ni belongs.
+ * in the ntfs volume @vol.
  *
  * NOTE:  We always perform synchronous i/o and ignore the @sync parameter.
  *
  * TODO:  If @sync is false, want to do truly asynchronous i/o, i.e. just
  * schedule i/o via ->writepage or do it via kntfsd or whatever.
  */
-static int sync_mft_mirror(ntfs_inode *ni, MFT_RECORD *m, int sync)
+int ntfs_sync_mft_mirror(ntfs_volume *vol, const unsigned long mft_no,
+		MFT_RECORD *m, int sync)
 {
-	ntfs_volume *vol = ni->vol;
 	struct page *page;
 	unsigned int blocksize = vol->sb->s_blocksize;
 	int max_bhs = vol->mft_record_size / blocksize;
@@ -537,17 +457,17 @@
 	unsigned int block_start, block_end, m_start, m_end;
 	int i_bhs, nr_bhs, err = 0;
 
-	ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
+	ntfs_debug("Entering for inode 0x%lx.", mft_no);
 	BUG_ON(!max_bhs);
 	if (unlikely(!vol->mftmirr_ino)) {
 		/* This could happen during umount... */
-		err = sync_mft_mirror_umount(ni, m);
+		err = ntfs_sync_mft_mirror_umount(vol, mft_no, m);
 		if (likely(!err))
 			return err;
 		goto err_out;
 	}
 	/* Get the page containing the mirror copy of the mft record @m. */
-	page = ntfs_map_page(vol->mftmirr_ino->i_mapping, ni->mft_no >>
+	page = ntfs_map_page(vol->mftmirr_ino->i_mapping, mft_no >>
 			(PAGE_CACHE_SHIFT - vol->mft_record_size_bits));
 	if (IS_ERR(page)) {
 		ntfs_error(vol->sb, "Failed to map mft mirror page.");
@@ -561,23 +481,17 @@
 	 * make sure no one is writing from elsewhere.
 	 */
 	lock_page(page);
+	BUG_ON(!PageUptodate(page));
+	ClearPageUptodate(page);
 	/* The address in the page of the mirror copy of the mft record @m. */
-	kmirr = page_address(page) + ((ni->mft_no << vol->mft_record_size_bits)
-			& ~PAGE_CACHE_MASK);
+	kmirr = page_address(page) + ((mft_no << vol->mft_record_size_bits) &
+			~PAGE_CACHE_MASK);
 	/* Copy the mst protected mft record to the mirror. */
 	memcpy(kmirr, m, vol->mft_record_size);
 	/* Make sure we have mapped buffers. */
-	if (!page_has_buffers(page)) {
-no_buffers_err_out:
-		ntfs_error(vol->sb, "Writing mft mirror records without "
-				"existing buffers is not implemented yet.  %s",
-				ntfs_please_email);
-		err = -EOPNOTSUPP;
-		goto unlock_err_out;
-	}
+	BUG_ON(!page_has_buffers(page));
 	bh = head = page_buffers(page);
-	if (!bh)
-		goto no_buffers_err_out;
+	BUG_ON(!bh);
 	nr_bhs = 0;
 	block_start = 0;
 	m_start = kmirr - (u8*)page_address(page);
@@ -587,22 +501,8 @@
 		/* If the buffer is outside the mft record, skip it. */
 		if ((block_end <= m_start) || (block_start >= m_end))
 			continue;
-		if (!buffer_mapped(bh)) {
-			ntfs_error(vol->sb, "Writing mft mirror records "
-					"without existing mapped buffers is "
-					"not implemented yet.  %s",
-					ntfs_please_email);
-			err = -EOPNOTSUPP;
-			continue;
-		}
-		if (!buffer_uptodate(bh)) {
-			ntfs_error(vol->sb, "Writing mft mirror records "
-					"without existing uptodate buffers is "
-					"not implemented yet.  %s",
-					ntfs_please_email);
-			err = -EOPNOTSUPP;
-			continue;
-		}
+		BUG_ON(!buffer_mapped(bh));
+		BUG_ON(!buffer_uptodate(bh));
 		BUG_ON(!nr_bhs && (m_start != block_start));
 		BUG_ON(nr_bhs >= max_bhs);
 		bhs[nr_bhs++] = bh;
@@ -630,11 +530,10 @@
 			if (unlikely(!buffer_uptodate(tbh))) {
 				err = -EIO;
 				/*
-				 * Set the buffer uptodate so the page & buffer
-				 * states don't become out of sync.
+				 * Set the buffer uptodate so the page and
+				 * buffer states do not become out of sync.
 				 */
-				if (PageUptodate(page))
-					set_buffer_uptodate(tbh);
+				set_buffer_uptodate(tbh);
 			}
 		}
 	} else /* if (unlikely(err)) */ {
@@ -642,29 +541,25 @@
 		for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++)
 			clear_buffer_dirty(bhs[i_bhs]);
 	}
-unlock_err_out:
 	/* Current state: all buffers are clean, unlocked, and uptodate. */
 	/* Remove the mst protection fixups again. */
 	post_write_mst_fixup((NTFS_RECORD*)kmirr);
 	flush_dcache_page(page);
+	SetPageUptodate(page);
 	unlock_page(page);
 	ntfs_unmap_page(page);
-	if (unlikely(err)) {
-		/* I/O error during writing.  This is really bad! */
+	if (likely(!err)) {
+		ntfs_debug("Done.");
+	} else {
 		ntfs_error(vol->sb, "I/O error while writing mft mirror "
-				"record 0x%lx!  You should unmount the volume "
-				"and run chkdsk or ntfsfix.", ni->mft_no);
-		goto err_out;
-	}
-	ntfs_debug("Done.");
-	return 0;
+				"record 0x%lx!", mft_no);
 err_out:
-	ntfs_error(vol->sb, "Failed to synchronize $MFTMirr (error code %i).  "
-			"Volume will be left marked dirty on umount.  Run "
-			"ntfsfix on the partition after umounting to correct "
-			"this.", -err);
-	/* We don't want to clear the dirty bit on umount. */
-	NVolSetErrors(vol);
+		ntfs_error(vol->sb, "Failed to synchronize $MFTMirr (error "
+				"code %i).  Volume will be left marked dirty "
+				"on umount.  Run ntfsfix on the partition "
+				"after umounting to correct this.", -err);
+		NVolSetErrors(vol);
+	}
 	return err;
 }
 
@@ -785,7 +680,7 @@
 	}
 	/* Synchronize the mft mirror now if not @sync. */
 	if (!sync && ni->mft_no < vol->mftmirr_size)
-		sync_mft_mirror(ni, m, sync);
+		ntfs_sync_mft_mirror(vol, ni->mft_no, m, sync);
 	/* Wait on i/o completion of buffers. */
 	for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++) {
 		struct buffer_head *tbh = bhs[i_bhs];
@@ -803,7 +698,7 @@
 	}
 	/* If @sync, now synchronize the mft mirror. */
 	if (sync && ni->mft_no < vol->mftmirr_size)
-		sync_mft_mirror(ni, m, sync);
+		ntfs_sync_mft_mirror(vol, ni->mft_no, m, sync);
 	/* Remove the mst protection fixups again. */
 	post_write_mst_fixup((NTFS_RECORD*)m);
 	flush_dcache_mft_record_page(ni);
@@ -839,221 +734,257 @@
 }
 
 /**
- * ntfs_mft_writepage - check if a metadata page contains dirty mft records
- * @page:	metadata page possibly containing dirty mft records
- * @wbc:	writeback control structure
- *
- * This is called from the VM when it wants to have a dirty $MFT/$DATA metadata
- * page cache page cleaned.  The VM has already locked the page and marked it
- * clean.  Instead of writing the page as a conventional ->writepage function
- * would do, we check if the page still contains any dirty mft records (it must
- * have done at some point in the past since the page was marked dirty) and if
- * none are found, i.e. all mft records are clean, we unlock the page and
- * return.  The VM is then free to do with the page as it pleases.  If on the
- * other hand we do find any dirty mft records in the page, we redirty the page
- * before unlocking it and returning so the VM knows that the page is still
- * busy and cannot be thrown out.
- *
- * Note, we do not actually write any dirty mft records here because they are
- * dirty inodes and hence will be written by the VFS inode dirty code paths.
- * There is no need to write them from the VM page dirty code paths, too and in
- * fact once we implement journalling it would be a complete nightmare having
- * two code paths leading to mft record writeout.
+ * ntfs_may_write_mft_record - check if an mft record may be written out
+ * @vol:	[IN]  ntfs volume on which the mft record to check resides
+ * @mft_no:	[IN]  mft record number of the mft record to check
+ * @m:		[IN]  mapped mft record to check
+ * @locked_ni:	[OUT] caller has to unlock this ntfs inode if one is returned
+ *
+ * Check if the mapped (base or extent) mft record @m with mft record number
+ * @mft_no belonging to the ntfs volume @vol may be written out.  If necessary
+ * and possible the ntfs inode of the mft record is locked and the base vfs
+ * inode is pinned.  The locked ntfs inode is then returned in @locked_ni.  The
+ * caller is responsible for unlocking the ntfs inode and unpinning the base
+ * vfs inode.
+ *
+ * Return TRUE if the mft record may be written out and FALSE if not.
+ *
+ * The caller has locked the page and cleared the uptodate flag on it which
+ * means that we can safely write out any dirty mft records that do not have
+ * their inodes in icache as determined by ilookup5() as anyone
+ * opening/creating such an inode would block when attempting to map the mft
+ * record in read_cache_page() until we are finished with the write out.
+ *
+ * Here is a description of the tests we perform:
+ *
+ * If the inode is found in icache we know the mft record must be a base mft
+ * record.  If it is dirty, we do not write it and return FALSE as the vfs
+ * inode write paths will result in the access times being updated which would
+ * cause the base mft record to be redirtied and written out again.  (We know
+ * the access time update will modify the base mft record because Windows
+ * chkdsk complains if the standard information attribute is not in the base
+ * mft record.)
+ *
+ * If the inode is in icache and not dirty, we attempt to lock the mft record
+ * and if we find the lock was already taken, it is not safe to write the mft
+ * record and we return FALSE.
+ *
+ * If we manage to obtain the lock we have exclusive access to the mft record,
+ * which also allows us safe writeout of the mft record.  We then set
+ * @locked_ni to the locked ntfs inode and return TRUE.
+ *
+ * Note we cannot just lock the mft record and sleep while waiting for the lock
+ * because this would deadlock due to lock reversal (normally the mft record is
+ * locked before the page is locked but we already have the page locked here
+ * when we try to lock the mft record).
+ *
+ * If the inode is not in icache we need to perform further checks.
+ *
+ * If the mft record is not a FILE record or it is a base mft record, we can
+ * safely write it and return TRUE.
+ *
+ * We now know the mft record is an extent mft record.  We check if the inode
+ * corresponding to its base mft record is in icache and obtain a reference to
+ * it if it is.  If it is not, we can safely write it and return TRUE.
+ *
+ * We now have the base inode for the extent mft record.  We check if it has an
+ * ntfs inode for the extent mft record attached and if not it is safe to write
+ * the extent mft record and we return TRUE.
+ *
+ * The ntfs inode for the extent mft record is attached to the base inode so we
+ * attempt to lock the extent mft record and if we find the lock was already
+ * taken, it is not safe to write the extent mft record and we return FALSE.
+ *
+ * If we manage to obtain the lock we have exclusive access to the extent mft
+ * record, which also allows us safe writeout of the extent mft record.  We
+ * set the ntfs inode of the extent mft record clean and then set @locked_ni to
+ * the now locked ntfs inode and return TRUE.
+ *
+ * Note, the reason for actually writing dirty mft records here and not just
+ * relying on the vfs inode dirty code paths is that we can have mft records
+ * modified without them ever having actual inodes in memory.  Also we can have
+ * dirty mft records with clean ntfs inodes in memory.  None of the described
+ * cases would result in the dirty mft records being written out if we only
+ * relied on the vfs inode dirty code paths.  And these cases can really occur
+ * during allocation of new mft records and in particular when the
+ * initialized_size of the $MFT/$DATA attribute is extended and the new space
+ * is initialized using ntfs_mft_record_format().  The clean inode can then
+ * appear if the mft record is reused for a new inode before it got written
+ * out.
  */
-static int ntfs_mft_writepage(struct page *page, struct writeback_control *wbc)
+BOOL ntfs_may_write_mft_record(ntfs_volume *vol, const unsigned long mft_no,
+		const MFT_RECORD *m, ntfs_inode **locked_ni)
 {
-	struct inode *mft_vi = page->mapping->host;
-	struct super_block *sb = mft_vi->i_sb;
-	ntfs_volume *vol = NTFS_SB(sb);
-	u8 *maddr;
-	MFT_RECORD *m;
-	ntfs_inode **extent_nis;
-	unsigned long mft_no;
-	int nr, i, j;
-	BOOL is_dirty = FALSE;
+	struct super_block *sb = vol->sb;
+	struct inode *mft_vi = vol->mft_ino;
+	struct inode *vi;
+	ntfs_inode *ni, *eni, **extent_nis;
+	int i;
+	ntfs_attr na;
 
-	BUG_ON(!PageLocked(page));
-	BUG_ON(PageWriteback(page));
-	BUG_ON(mft_vi != vol->mft_ino);
-	/* The first mft record number in the page. */
-	mft_no = page->index << (PAGE_CACHE_SHIFT - vol->mft_record_size_bits);
-	/* Number of mft records in the page. */
-	nr = PAGE_CACHE_SIZE >> vol->mft_record_size_bits;
-	BUG_ON(!nr);
-	ntfs_debug("Entering for %i inodes starting at 0x%lx.", nr, mft_no);
-	/* Iterate over the mft records in the page looking for a dirty one. */
-	maddr = (u8*)kmap(page);
+	ntfs_debug("Entering for inode 0x%lx.", mft_no);
 	/*
-	 * Clear the page uptodate flag.  This will cause anyone trying to get
-	 * hold of the page to block on the page lock in read_cache_page().
+	 * Normally we do not return a locked inode so set @locked_ni to NULL.
 	 */
-	BUG_ON(!PageUptodate(page));
-	ClearPageUptodate(page);
-	for (i = 0; i < nr; ++i, ++mft_no, maddr += vol->mft_record_size) {
-		struct inode *vi;
-		ntfs_inode *ni, *eni;
-		ntfs_attr na;
-
-		na.mft_no = mft_no;
-		na.name = NULL;
-		na.name_len = 0;
-		na.type = AT_UNUSED;
-		/*
-		 * Check if the inode corresponding to this mft record is in
-		 * the VFS inode cache and obtain a reference to it if it is.
-		 */
-		ntfs_debug("Looking for inode 0x%lx in icache.", mft_no);
-		/*
-		 * For inode 0, i.e. $MFT itself, we cannot use ilookup5() from
-		 * here or we deadlock because the inode is already locked by
-		 * the kernel (fs/fs-writeback.c::__sync_single_inode()) and
-		 * ilookup5() waits until the inode is unlocked before
-		 * returning it and it never gets unlocked because
-		 * ntfs_mft_writepage() never returns.  )-:  Fortunately, we
-		 * have inode 0 pinned in icache for the duration of the mount
-		 * so we can access it directly.
-		 */
-		if (!mft_no) {
-			/* Balance the below iput(). */
-			vi = igrab(mft_vi);
-			BUG_ON(vi != mft_vi);
-		} else
-			vi = ilookup5(sb, mft_no, (test_t)ntfs_test_inode, &na);
-		if (vi) {
-			ntfs_debug("Inode 0x%lx is in icache.", mft_no);
-			/* The inode is in icache.  Check if it is dirty. */
-			ni = NTFS_I(vi);
-			if (!NInoDirty(ni)) {
-				/* The inode is not dirty, skip this record. */
-				ntfs_debug("Inode 0x%lx is not dirty, "
-						"continuing search.", mft_no);
-				iput(vi);
-				continue;
-			}
-			ntfs_debug("Inode 0x%lx is dirty, aborting search.",
+	BUG_ON(!locked_ni);
+	*locked_ni = NULL;
+	/*
+	 * Check if the inode corresponding to this mft record is in the VFS
+	 * inode cache and obtain a reference to it if it is.
+	 */
+	ntfs_debug("Looking for inode 0x%lx in icache.", mft_no);
+	na.mft_no = mft_no;
+	na.name = NULL;
+	na.name_len = 0;
+	na.type = AT_UNUSED;
+	/*
+	 * For inode 0, i.e. $MFT itself, we cannot use ilookup5() from here or
+	 * we deadlock because the inode is already locked by the kernel
+	 * (fs/fs-writeback.c::__sync_single_inode()) and ilookup5() waits
+	 * until the inode is unlocked before returning it and it never gets
+	 * unlocked because ntfs_should_write_mft_record() never returns.  )-:
+	 * Fortunately, we have inode 0 pinned in icache for the duration of
+	 * the mount so we can access it directly.
+	 */
+	if (!mft_no) {
+		/* Balance the below iput(). */
+		vi = igrab(mft_vi);
+		BUG_ON(vi != mft_vi);
+	} else
+		vi = ilookup5(sb, mft_no, (test_t)ntfs_test_inode, &na);
+	if (vi) {
+		ntfs_debug("Base inode 0x%lx is in icache.", mft_no);
+		/* The inode is in icache. */
+		ni = NTFS_I(vi);
+		/* Take a reference to the ntfs inode. */
+		atomic_inc(&ni->count);
+		/* If the inode is dirty, do not write this record. */
+		if (NInoDirty(ni)) {
+			ntfs_debug("Inode 0x%lx is dirty, do not write it.",
 					mft_no);
-			/* The inode is dirty, no need to search further. */
+			atomic_dec(&ni->count);
 			iput(vi);
-			is_dirty = TRUE;
-			break;
+			return FALSE;
 		}
-		ntfs_debug("Inode 0x%lx is not in icache.", mft_no);
-		/* The inode is not in icache. */
-		/* Skip the record if it is not a mft record (type "FILE"). */
-		if (!ntfs_is_mft_recordp((le32*)maddr)) {
-			ntfs_debug("Mft record 0x%lx is not a FILE record, "
-					"continuing search.", mft_no);
-			continue;
+		ntfs_debug("Inode 0x%lx is not dirty.", mft_no);
+		/* The inode is not dirty, try to take the mft record lock. */
+		if (unlikely(down_trylock(&ni->mrec_lock))) {
+			ntfs_debug("Mft record 0x%lx is already locked, do "
+					"not write it.", mft_no);
+			atomic_dec(&ni->count);
+			iput(vi);
+			return FALSE;
 		}
-		m = (MFT_RECORD*)maddr;
+		ntfs_debug("Managed to lock mft record 0x%lx, write it.",
+				mft_no);
 		/*
-		 * Skip the mft record if it is not in use.  FIXME:  What about
-		 * deleted/deallocated (extent) inodes?  (AIA)
+		 * The write has to occur while we hold the mft record lock so
+		 * return the locked ntfs inode.
 		 */
-		if (!(m->flags & MFT_RECORD_IN_USE)) {
-			ntfs_debug("Mft record 0x%lx is not in use, "
-					"continuing search.", mft_no);
-			continue;
-		}
-		/* Skip the mft record if it is a base inode. */
-		if (!m->base_mft_record) {
-			ntfs_debug("Mft record 0x%lx is a base record, "
-					"continuing search.", mft_no);
-			continue;
-		}
+		*locked_ni = ni;
+		return TRUE;
+	}
+	ntfs_debug("Inode 0x%lx is not in icache.", mft_no);
+	/* The inode is not in icache. */
+	/* Write the record if it is not a mft record (type "FILE"). */
+	if (!ntfs_is_mft_record(m->magic)) {
+		ntfs_debug("Mft record 0x%lx is not a FILE record, write it.",
+				mft_no);
+		return TRUE;
+	}
+	/* Write the mft record if it is a base inode. */
+	if (!m->base_mft_record) {
+		ntfs_debug("Mft record 0x%lx is a base record, write it.",
+				mft_no);
+		return TRUE;
+	}
+	/*
+	 * This is an extent mft record.  Check if the inode corresponding to
+	 * its base mft record is in icache and obtain a reference to it if it
+	 * is.
+	 */
+	na.mft_no = MREF_LE(m->base_mft_record);
+	ntfs_debug("Mft record 0x%lx is an extent record.  Looking for base "
+			"inode 0x%lx in icache.", mft_no, na.mft_no);
+	vi = ilookup5(sb, na.mft_no, (test_t)ntfs_test_inode, &na);
+	if (!vi) {
 		/*
-		 * This is an extent mft record.  Check if the inode
-		 * corresponding to its base mft record is in icache.
+		 * The base inode is not in icache, write this extent mft
+		 * record.
 		 */
-		na.mft_no = MREF_LE(m->base_mft_record);
-		ntfs_debug("Mft record 0x%lx is an extent record.  Looking "
-				"for base inode 0x%lx in icache.", mft_no,
-				na.mft_no);
-		vi = ilookup5(sb, na.mft_no, (test_t)ntfs_test_inode,
-				&na);
-		if (!vi) {
-			/*
-			 * The base inode is not in icache.  Skip this extent
-			 * mft record.
-			 */
-			ntfs_debug("Base inode 0x%lx is not in icache, "
-					"continuing search.", na.mft_no);
-			continue;
-		}
-		ntfs_debug("Base inode 0x%lx is in icache.", na.mft_no);
+		ntfs_debug("Base inode 0x%lx is not in icache, write the "
+				"extent record.", na.mft_no);
+		return TRUE;
+	}
+	ntfs_debug("Base inode 0x%lx is in icache.", na.mft_no);
+	/*
+	 * The base inode is in icache.  Check if it has the extent inode
+	 * corresponding to this extent mft record attached.
+	 */
+	ni = NTFS_I(vi);
+	down(&ni->extent_lock);
+	if (ni->nr_extents <= 0) {
 		/*
-		 * The base inode is in icache.  Check if it has the extent
-		 * inode corresponding to this extent mft record attached.
+		 * The base inode has no attached extent inodes, write this
+		 * extent mft record.
 		 */
-		ni = NTFS_I(vi);
-		down(&ni->extent_lock);
-		if (ni->nr_extents <= 0) {
+		up(&ni->extent_lock);
+		iput(vi);
+		ntfs_debug("Base inode 0x%lx has no attached extent inodes, "
+				"write the extent record.", na.mft_no);
+		return TRUE;
+	}
+	/* Iterate over the attached extent inodes. */
+	extent_nis = ni->ext.extent_ntfs_inos;
+	for (eni = NULL, i = 0; i < ni->nr_extents; ++i) {
+		if (mft_no == extent_nis[i]->mft_no) {
 			/*
-			 * The base inode has no attached extent inodes.  Skip
-			 * this extent mft record.
+			 * Found the extent inode corresponding to this extent
+			 * mft record.
 			 */
-			up(&ni->extent_lock);
-			iput(vi);
-			continue;
-		}
-		/* Iterate over the attached extent inodes. */
-		extent_nis = ni->ext.extent_ntfs_inos;
-		for (eni = NULL, j = 0; j < ni->nr_extents; ++j) {
-			if (mft_no == extent_nis[j]->mft_no) {
-				/*
-				 * Found the extent inode corresponding to this
-				 * extent mft record.
-				 */
-				eni = extent_nis[j];
-				break;
-			}
-		}
-		/*
-		 * If the extent inode was not attached to the base inode, skip
-		 * this extent mft record.
-		 */
-		if (!eni) {
-			up(&ni->extent_lock);
-			iput(vi);
-			continue;
-		}
-		/*
-		 * Found the extent inode corrsponding to this extent mft
-		 * record.  If it is dirty, no need to search further.
-		 */
-		if (NInoDirty(eni)) {
-			up(&ni->extent_lock);
-			iput(vi);
-			is_dirty = TRUE;
+			eni = extent_nis[i];
 			break;
 		}
-		/* The extent inode is not dirty, so do the next record. */
+	}
+	/*
+	 * If the extent inode was not attached to the base inode, write this
+	 * extent mft record.
+	 */
+	if (!eni) {
 		up(&ni->extent_lock);
 		iput(vi);
-	}
-	SetPageUptodate(page);
-	kunmap(page);
-	/* If a dirty mft record was found, redirty the page. */
-	if (is_dirty) {
-		ntfs_debug("Inode 0x%lx is dirty.  Redirtying the page "
-				"starting at inode 0x%lx.", mft_no,
-				page->index << (PAGE_CACHE_SHIFT -
-				vol->mft_record_size_bits));
-		redirty_page_for_writepage(wbc, page);
-		unlock_page(page);
-	} else {
-		/*
-		 * Keep the VM happy.  This must be done otherwise the
-		 * radix-tree tag PAGECACHE_TAG_DIRTY remains set even though
-		 * the page is clean.
-		 */
-		BUG_ON(PageWriteback(page));
-		set_page_writeback(page);
-		unlock_page(page);
-		end_page_writeback(page);
-	}
-	ntfs_debug("Done.");
-	return 0;
+		ntfs_debug("Extent inode 0x%lx is not attached to its base "
+				"inode 0x%lx, write the extent record.",
+				mft_no, na.mft_no);
+		return TRUE;
+	}
+	ntfs_debug("Extent inode 0x%lx is attached to its base inode 0x%lx.",
+			mft_no, na.mft_no);
+	/* Take a reference to the extent ntfs inode. */
+	atomic_inc(&eni->count);
+	up(&ni->extent_lock);
+	/*
+	 * Found the extent inode coresponding to this extent mft record.
+	 * Try to take the mft record lock.
+	 */
+	if (unlikely(down_trylock(&eni->mrec_lock))) {
+		atomic_dec(&eni->count);
+		iput(vi);
+		ntfs_debug("Extent mft record 0x%lx is already locked, do "
+				"not write it.", mft_no);
+		return FALSE;
+	}
+	ntfs_debug("Managed to lock extent mft record 0x%lx, write it.",
+			mft_no);
+	if (NInoTestClearDirty(eni))
+		ntfs_debug("Extent inode 0x%lx is dirty, marking it clean.",
+				mft_no);
+	/*
+	 * The write has to occur while we hold the mft record lock so return
+	 * the locked extent ntfs inode.
+	 */
+	*locked_ni = eni;
+	return TRUE;
 }
 
 static const char *es = "  Leaving inconsistent metadata.  Unmount and run "
diff -Nru a/fs/ntfs/mft.h b/fs/ntfs/mft.h
--- a/fs/ntfs/mft.h	2004-10-19 10:14:44 +01:00
+++ b/fs/ntfs/mft.h	2004-10-19 10:14:44 +01:00
@@ -29,7 +29,6 @@
 
 #include "inode.h"
 
-extern MFT_RECORD *try_map_mft_record(ntfs_inode *ni);
 extern MFT_RECORD *map_mft_record(ntfs_inode *ni);
 extern void unmap_mft_record(ntfs_inode *ni);
 
@@ -77,6 +76,9 @@
 		__mark_mft_record_dirty(ni);
 }
 
+extern int ntfs_sync_mft_mirror(ntfs_volume *vol, const unsigned long mft_no,
+		MFT_RECORD *m, int sync);
+
 extern int write_mft_record_nolock(ntfs_inode *ni, MFT_RECORD *m, int sync);
 
 /**
@@ -111,6 +113,10 @@
 	unlock_page(page);
 	return err;
 }
+
+extern BOOL ntfs_may_write_mft_record(ntfs_volume *vol,
+		const unsigned long mft_no, const MFT_RECORD *m,
+		ntfs_inode **locked_ni);
 
 extern int ntfs_extent_mft_record_free(ntfs_inode *ni, MFT_RECORD *m);
 
diff -Nru a/fs/ntfs/ntfs.h b/fs/ntfs/ntfs.h
--- a/fs/ntfs/ntfs.h	2004-10-19 10:14:44 +01:00
+++ b/fs/ntfs/ntfs.h	2004-10-19 10:14:44 +01:00
@@ -56,7 +56,6 @@
 extern struct super_operations ntfs_sops;
 extern struct address_space_operations ntfs_aops;
 extern struct address_space_operations ntfs_mst_aops;
-extern struct address_space_operations ntfs_mft_aops;
 
 extern struct  file_operations ntfs_file_ops;
 extern struct inode_operations ntfs_file_inode_ops;
diff -Nru a/fs/ntfs/super.c b/fs/ntfs/super.c
--- a/fs/ntfs/super.c	2004-10-19 10:14:44 +01:00
+++ b/fs/ntfs/super.c	2004-10-19 10:14:44 +01:00
@@ -946,8 +946,8 @@
 	/* No VFS initiated operations allowed for $MFTMirr. */
 	tmp_ino->i_op = &ntfs_empty_inode_ops;
 	tmp_ino->i_fop = &ntfs_empty_file_ops;
-	/* Put back our special address space operations. */
-	tmp_ino->i_mapping->a_ops = &ntfs_mft_aops;
+	/* Put in our special address space operations. */
+	tmp_ino->i_mapping->a_ops = &ntfs_mst_aops;
 	tmp_ni = NTFS_I(tmp_ino);
 	/* The $MFTMirr, like the $MFT is multi sector transfer protected. */
 	NInoSetMstProtected(tmp_ni);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 28/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:45                                                     ` [PATCH 27/37] " Anton Altaparmakov
@ 2004-10-19  9:45                                                       ` Anton Altaparmakov
  2004-10-19  9:45                                                         ` [PATCH 29/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 28/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/14 1.2047)
   NTFS: - Fix two race conditions in fs/ntfs/inode.c::ntfs_put_inode().
   
   - Fix race condition in fs/ntfs/inode.c::ntfs_put_inode() by moving the
     index inode bitmap inode release code from there to
     fs/ntfs/inode.c::ntfs_clear_big_inode().  (Thanks to Christoph
     Hellwig for spotting this.)
   - Fix race condition in fs/ntfs/inode.c::ntfs_put_inode() by taking the
     inode semaphore around the code thst sets ni->itype.index.bmp_ino to
     NULL and reorganize the code to optimize it a bit.  (Thanks to
     Christoph Hellwig for spotting this.)
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:47 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:47 +01:00
@@ -110,6 +110,14 @@
 	- Fix callers of fs/ntfs/aops.c::mark_ntfs_record_dirty() to call it
 	  with the ntfs inode which contains the page rather than the ntfs
 	  inode the mft record of which is in the page.
+	- Fix race condition in fs/ntfs/inode.c::ntfs_put_inode() by moving the
+	  index inode bitmap inode release code from there to
+	  fs/ntfs/inode.c::ntfs_clear_big_inode().  (Thanks to Christoph
+	  Hellwig for spotting this.)
+	- Fix race condition in fs/ntfs/inode.c::ntfs_put_inode() by taking the
+	  inode semaphore around the code thst sets ni->itype.index.bmp_ino to
+	  NULL and reorganize the code to optimize it a bit.  (Thanks to
+	  Christoph Hellwig for spotting this.)
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/inode.c b/fs/ntfs/inode.c
--- a/fs/ntfs/inode.c	2004-10-19 10:14:47 +01:00
+++ b/fs/ntfs/inode.c	2004-10-19 10:14:47 +01:00
@@ -2095,37 +2095,24 @@
  * dropped, we need to put the attribute inode for the directory index bitmap,
  * if it is present, otherwise the directory inode would remain pinned for
  * ever.
- *
- * If the inode @vi is an index inode with only one reference which is being
- * dropped, we need to put the attribute inode for the index bitmap, if it is
- * present, otherwise the index inode would disappear and the attribute inode
- * for the index bitmap would no longer be referenced from anywhere and thus it
- * would remain pinned for ever.
  */
 void ntfs_put_inode(struct inode *vi)
 {
-	ntfs_inode *ni;
-
-	if (S_ISDIR(vi->i_mode)) {
-		if (atomic_read(&vi->i_count) == 2) {
-			ni = NTFS_I(vi);
-			if (NInoIndexAllocPresent(ni) &&
-					ni->itype.index.bmp_ino) {
-				iput(ni->itype.index.bmp_ino);
-				ni->itype.index.bmp_ino = NULL;
+	if (S_ISDIR(vi->i_mode) && atomic_read(&vi->i_count) == 2) {
+		ntfs_inode *ni = NTFS_I(vi);
+		if (NInoIndexAllocPresent(ni)) {
+			struct inode *bvi = NULL;
+			down(&vi->i_sem);
+			if (atomic_read(&vi->i_count) == 2) {
+				bvi = ni->itype.index.bmp_ino;
+				if (bvi)
+					ni->itype.index.bmp_ino = NULL;
 			}
+			up(&vi->i_sem);
+			if (bvi)
+				iput(bvi);
 		}
-		return;
-	}
-	if (atomic_read(&vi->i_count) != 1)
-		return;
-	ni = NTFS_I(vi);
-	if (NInoAttr(ni) && (ni->type == AT_INDEX_ALLOCATION) &&
-			NInoIndexAllocPresent(ni) && ni->itype.index.bmp_ino) {
-		iput(ni->itype.index.bmp_ino);
-		ni->itype.index.bmp_ino = NULL;
 	}
-	return;
 }
 
 void __ntfs_clear_inode(ntfs_inode *ni)
@@ -2193,6 +2180,18 @@
 {
 	ntfs_inode *ni = NTFS_I(vi);
 
+	/*
+	 * If the inode @vi is an index inode we need to put the attribute
+	 * inode for the index bitmap, if it is present, otherwise the index
+	 * inode would disappear and the attribute inode for the index bitmap
+	 * would no longer be referenced from anywhere and thus it would remain
+	 * pinned for ever.
+	 */
+	if (NInoAttr(ni) && (ni->type == AT_INDEX_ALLOCATION) &&
+			NInoIndexAllocPresent(ni) && ni->itype.index.bmp_ino) {
+		iput(ni->itype.index.bmp_ino);
+		ni->itype.index.bmp_ino = NULL;
+	}
 #ifdef NTFS_RW
 	if (NInoDirty(ni)) {
 		BOOL was_bad = (is_bad_inode(vi));

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 29/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:45                                                       ` [PATCH 28/37] " Anton Altaparmakov
@ 2004-10-19  9:45                                                         ` Anton Altaparmakov
  2004-10-19  9:46                                                           ` [PATCH 30/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 29/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/15 1.2047.1.1)
   NTFS: Modify fs/ntfs/aops.c::mark_ntfs_record_dirty() to no longer take the
         ntfs inode as a parameter as this is confusing and misleading and the
         ntfs inode is available via NTFS_I(page->mapping->host).
         Adapt all callers to this change.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:51 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:51 +01:00
@@ -118,6 +118,10 @@
 	  inode semaphore around the code thst sets ni->itype.index.bmp_ino to
 	  NULL and reorganize the code to optimize it a bit.  (Thanks to
 	  Christoph Hellwig for spotting this.)
+	- Modify fs/ntfs/aops.c::mark_ntfs_record_dirty() to no longer take the
+	  ntfs inode as a parameter as this is confusing and misleading and the
+	  needed ntfs inode is available via NTFS_I(page->mapping->host).
+	  Adapt all callers to this change.
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/aops.c b/fs/ntfs/aops.c
--- a/fs/ntfs/aops.c	2004-10-19 10:14:51 +01:00
+++ b/fs/ntfs/aops.c	2004-10-19 10:14:51 +01:00
@@ -2132,9 +2132,8 @@
 
 /**
  * mark_ntfs_record_dirty - mark an ntfs record dirty
- * @ni:		ntfs inode containing the ntfs record to be marked dirty
  * @page:	page containing the ntfs record to mark dirty
- * @rec_start:	byte offset within @page at which the ntfs record begins
+ * @ofs:	byte offset within @page at which the ntfs record begins
  *
  * If the ntfs record is the same size as the page cache page @page, set all
  * buffers in the page dirty.  Otherwise, set only the buffers in which the
@@ -2143,26 +2142,29 @@
  * Also, set the page containing the ntfs record dirty, which also marks the
  * vfs inode the ntfs record belongs to dirty (I_DIRTY_PAGES).
  */
-void mark_ntfs_record_dirty(ntfs_inode *ni, struct page *page,
-		unsigned int rec_start) {
+void mark_ntfs_record_dirty(struct page *page, const unsigned int ofs) {
+	ntfs_inode *ni;
 	struct buffer_head *bh, *head;
-	unsigned int rec_end, bh_size, bh_start, bh_end;
+	unsigned int end, bh_size, bh_ofs;
 
 	BUG_ON(!page);
 	BUG_ON(!page_has_buffers(page));
+	ni = NTFS_I(page->mapping->host);
+	BUG_ON(!ni);
 	if (ni->itype.index.block_size == PAGE_CACHE_SIZE) {
 		__set_page_dirty_buffers(page);
 		return;
 	}
-	rec_end = rec_start + ni->itype.index.block_size;
+	end = ofs + ni->itype.index.block_size;
 	bh_size = ni->vol->sb->s_blocksize;
-	bh_start = 0;
 	bh = head = page_buffers(page);
 	do {
-		bh_end = bh_start + bh_size;
-		if ((bh_start >= rec_start) && (bh_end <= rec_end))
-			set_buffer_dirty(bh);
-		bh_start = bh_end;
+		bh_ofs = bh_offset(bh);
+		if (bh_ofs + bh_size <= ofs)
+			continue;
+		if (unlikely(bh_ofs >= end))
+			break;
+		set_buffer_dirty(bh);
 	} while ((bh = bh->b_this_page) != head);
 	__set_page_dirty_nobuffers(page);
 }
diff -Nru a/fs/ntfs/aops.h b/fs/ntfs/aops.h
--- a/fs/ntfs/aops.h	2004-10-19 10:14:51 +01:00
+++ b/fs/ntfs/aops.h	2004-10-19 10:14:51 +01:00
@@ -95,8 +95,7 @@
 
 #ifdef NTFS_RW
 
-extern void mark_ntfs_record_dirty(ntfs_inode *ni, struct page *page,
-		unsigned int rec_start);
+extern void mark_ntfs_record_dirty(struct page *page, const unsigned int ofs);
 
 #endif /* NTFS_RW */
 
diff -Nru a/fs/ntfs/index.h b/fs/ntfs/index.h
--- a/fs/ntfs/index.h	2004-10-19 10:14:51 +01:00
+++ b/fs/ntfs/index.h	2004-10-19 10:14:51 +01:00
@@ -139,8 +139,8 @@
 	if (ictx->is_in_root)
 		mark_mft_record_dirty(ictx->actx->ntfs_ino);
 	else
-		mark_ntfs_record_dirty(ictx->idx_ni, ictx->page,
-			(u8*)ictx->ia - (u8*)page_address(ictx->page));
+		mark_ntfs_record_dirty(ictx->page,
+				(u8*)ictx->ia - (u8*)page_address(ictx->page));
 }
 
 #endif /* NTFS_RW */
diff -Nru a/fs/ntfs/inode.c b/fs/ntfs/inode.c
--- a/fs/ntfs/inode.c	2004-10-19 10:14:51 +01:00
+++ b/fs/ntfs/inode.c	2004-10-19 10:14:51 +01:00
@@ -2513,8 +2513,8 @@
 	 * this function returns.
 	 */
 	if (modified && !NInoTestSetDirty(ctx->ntfs_ino))
-		mark_ntfs_record_dirty(NTFS_I(ni->vol->mft_ino),
-				ctx->ntfs_ino->page, ctx->ntfs_ino->page_ofs);
+		mark_ntfs_record_dirty(ctx->ntfs_ino->page,
+				ctx->ntfs_ino->page_ofs);
 	ntfs_attr_put_search_ctx(ctx);
 	/* Now the access times are updated, write the base mft record. */
 	if (NInoDirty(ni))
diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:14:51 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:14:51 +01:00
@@ -380,8 +380,7 @@
 
 	ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
 	BUG_ON(NInoAttr(ni));
-	mark_ntfs_record_dirty(NTFS_I(ni->vol->mft_ino), ni->page,
-			ni->page_ofs);
+	mark_ntfs_record_dirty(ni->page, ni->page_ofs);
 	/* Determine the base vfs inode and mark it dirty, too. */
 	down(&ni->extent_lock);
 	if (likely(ni->nr_extents >= 0))

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 30/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:45                                                         ` [PATCH 29/37] " Anton Altaparmakov
@ 2004-10-19  9:46                                                           ` Anton Altaparmakov
  2004-10-19  9:46                                                             ` [PATCH 31/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 30/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/15 1.2047.1.2)
   NTFS: Modify fs/ntfs/mft.c::write_mft_record_nolock() and
         fs/ntfs/aops.c::ntfs_write_mst_block() to only check the dirty state
         of the first buffer in a record and to take this as the ntfs record
         dirty state.  We cannot look at the dirty state for subsequent
         buffers because we might be racing with
         fs/ntfs/aops.c::mark_ntfs_record_dirty().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:55 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:55 +01:00
@@ -122,6 +122,12 @@
 	  ntfs inode as a parameter as this is confusing and misleading and the
 	  needed ntfs inode is available via NTFS_I(page->mapping->host).
 	  Adapt all callers to this change.
+	- Modify fs/ntfs/mft.c::write_mft_record_nolock() and
+	  fs/ntfs/aops.c::ntfs_write_mst_block() to only check the dirty state
+	  of the first buffer in a record and to take this as the ntfs record
+	  dirty state.  We cannot look at the dirty state for subsequent
+	  buffers because we might be racing with
+	  fs/ntfs/aops.c::mark_ntfs_record_dirty().
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/aops.c b/fs/ntfs/aops.c
--- a/fs/ntfs/aops.c	2004-10-19 10:14:55 +01:00
+++ b/fs/ntfs/aops.c	2004-10-19 10:14:55 +01:00
@@ -868,23 +868,24 @@
 			clear_buffer_dirty(bh);
 			continue;
 		}
-		if (rec_block == block) {
+		if (likely(block < rec_block)) {
+			/*
+			 * This block is not the first one in the record.  We
+			 * ignore the buffer's dirty state because we could
+			 * have raced with a parallel mark_ntfs_record_dirty().
+			 */
+			if (!rec_is_dirty)
+				continue;
+		} else /* if (block == rec_block) */ {
+			BUG_ON(block > rec_block);
 			/* This block is the first one in the record. */
 			rec_block += bhs_per_rec;
 			if (!buffer_dirty(bh)) {
-				/* Clean buffers are not written out. */
+				/* Clean records are not written out. */
 				rec_is_dirty = FALSE;
 				continue;
 			}
 			rec_is_dirty = TRUE;
-		} else {
-			/* This block is not the first one in the record. */
-			if (!buffer_dirty(bh)) {
-				/* Clean buffers are not written out. */
-				BUG_ON(rec_is_dirty);
-				continue;
-			}
-			BUG_ON(!rec_is_dirty);
 		}
 		BUG_ON(!buffer_mapped(bh));
 		BUG_ON(!buffer_uptodate(bh));
@@ -973,10 +974,8 @@
 			continue;
 		if (unlikely(test_set_buffer_locked(tbh)))
 			BUG();
-		if (unlikely(!test_clear_buffer_dirty(tbh))) {
-			unlock_buffer(tbh);
-			continue;
-		}
+		/* The buffer dirty state is now irrelevant, just clean it. */
+		clear_buffer_dirty(tbh);
 		BUG_ON(!buffer_uptodate(tbh));
 		BUG_ON(!buffer_mapped(tbh));
 		get_bh(tbh);
diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:14:55 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:14:55 +01:00
@@ -572,8 +572,10 @@
  * ntfs inode @ni to backing store.  If the mft record @m has a counterpart in
  * the mft mirror, that is also updated.
  *
- * We only write the mft record if the ntfs inode @ni is dirty and the buffers
- * belonging to its mft record are dirty, too.
+ * We only write the mft record if the ntfs inode @ni is dirty and the first
+ * buffer belonging to its mft record is dirty, too.  We ignore the dirty state
+ * of subsequent buffers because we could have raced with
+ * fs/ntfs/aops.c::mark_ntfs_record_dirty().
  *
  * On success, clean the mft record and return 0.  On error, leave the mft
  * record dirty and return -errno.  The caller should call make_bad_inode() on
@@ -632,21 +634,23 @@
 			continue;
 		if (unlikely(block_start >= m_end))
 			break;
-		/*
-		 * If the buffer is clean and it is the first buffer of the mft
-		 * record, it was written out by other means already so we are
-		 * done.  For safety we make sure all the other buffers are
-		 * clean also.  If it is clean but not the first buffer and the
-		 * first buffer was dirty it is a bug.
-		 */
-		if (!buffer_dirty(bh)) {
-			if (block_start == m_start)
+		if (block_start == m_start) {
+			/* This block is the first one in the record. */
+			if (!buffer_dirty(bh)) {
+				/* Clean records are not written out. */
 				rec_is_dirty = FALSE;
-			else
-				BUG_ON(rec_is_dirty);
-			continue;
+				continue;
+			}
+			rec_is_dirty = TRUE;
+		} else {
+			/*
+			 * This block is not the first one in the record.  We
+			 * ignore the buffer's dirty state because we could
+			 * have raced with a parallel mark_ntfs_record_dirty().
+			 */
+			if (!rec_is_dirty)
+				continue;
 		}
-		BUG_ON(!rec_is_dirty);
 		BUG_ON(!buffer_mapped(bh));
 		BUG_ON(!buffer_uptodate(bh));
 		BUG_ON(!nr_bhs && (m_start != block_start));

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 31/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:46                                                           ` [PATCH 30/37] " Anton Altaparmakov
@ 2004-10-19  9:46                                                             ` Anton Altaparmakov
  2004-10-19  9:46                                                               ` [PATCH 32/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 31/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/15 1.2051)
   NTFS: Move the static inline ntfs_init_big_inode() from fs/ntfs/inode.c to
         inode.h and make fs/ntfs/inode.c::__ntfs_init_inode() non-static and
         add a declaration for it to inode.h.  Fix some compilation issues
         that resulted due to #includes and header file interdependencies.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:14:59 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:14:59 +01:00
@@ -128,6 +128,10 @@
 	  dirty state.  We cannot look at the dirty state for subsequent
 	  buffers because we might be racing with
 	  fs/ntfs/aops.c::mark_ntfs_record_dirty().
+	- Move the static inline ntfs_init_big_inode() from fs/ntfs/inode.c to
+	  inode.h and make fs/ntfs/inode.c::__ntfs_init_inode() non-static and
+	  add a declaration for it to inode.h.  Fix some compilation issues
+	  that resulted due to #includes and header file interdependencies.
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/aops.c b/fs/ntfs/aops.c
--- a/fs/ntfs/aops.c	2004-10-19 10:14:59 +01:00
+++ b/fs/ntfs/aops.c	2004-10-19 10:14:59 +01:00
@@ -29,9 +29,11 @@
 #include <linux/writeback.h>
 
 #include "aops.h"
+#include "attrib.h"
 #include "debug.h"
 #include "inode.h"
 #include "mft.h"
+#include "runlist.h"
 #include "types.h"
 #include "ntfs.h"
 
diff -Nru a/fs/ntfs/compress.c b/fs/ntfs/compress.c
--- a/fs/ntfs/compress.c	2004-10-19 10:14:59 +01:00
+++ b/fs/ntfs/compress.c	2004-10-19 10:14:59 +01:00
@@ -25,6 +25,7 @@
 #include <linux/buffer_head.h>
 #include <linux/blkdev.h>
 
+#include "attrib.h"
 #include "inode.h"
 #include "debug.h"
 #include "ntfs.h"
diff -Nru a/fs/ntfs/debug.h b/fs/ntfs/debug.h
--- a/fs/ntfs/debug.h	2004-10-19 10:14:59 +01:00
+++ b/fs/ntfs/debug.h	2004-10-19 10:14:59 +01:00
@@ -22,13 +22,9 @@
 #ifndef _LINUX_NTFS_DEBUG_H
 #define _LINUX_NTFS_DEBUG_H
 
-#include <linux/kernel.h>
-#include <linux/string.h>
-#include <linux/spinlock.h>
 #include <linux/fs.h>
 
-#include "inode.h"
-#include "attrib.h"
+#include "runlist.h"
 
 #ifdef DEBUG
 
diff -Nru a/fs/ntfs/file.c b/fs/ntfs/file.c
--- a/fs/ntfs/file.c	2004-10-19 10:14:59 +01:00
+++ b/fs/ntfs/file.c	2004-10-19 10:14:59 +01:00
@@ -22,6 +22,7 @@
 #include <linux/pagemap.h>
 #include <linux/buffer_head.h>
 
+#include "inode.h"
 #include "debug.h"
 #include "ntfs.h"
 
diff -Nru a/fs/ntfs/inode.c b/fs/ntfs/inode.c
--- a/fs/ntfs/inode.c	2004-10-19 10:14:59 +01:00
+++ b/fs/ntfs/inode.c	2004-10-19 10:14:59 +01:00
@@ -373,7 +373,7 @@
  *
  * Return zero on success and -ENOMEM on error.
  */
-static void __ntfs_init_inode(struct super_block *sb, ntfs_inode *ni)
+void __ntfs_init_inode(struct super_block *sb, ntfs_inode *ni)
 {
 	ntfs_debug("Entering.");
 	ni->initialized_size = ni->allocated_size = 0;
@@ -396,17 +396,6 @@
 	init_MUTEX(&ni->extent_lock);
 	ni->nr_extents = 0;
 	ni->ext.base_ntfs_ino = NULL;
-	return;
-}
-
-static inline void ntfs_init_big_inode(struct inode *vi)
-{
-	ntfs_inode *ni = NTFS_I(vi);
-
-	ntfs_debug("Entering.");
-	__ntfs_init_inode(vi->i_sb, ni);
-	ni->mft_no = vi->i_ino;
-	return;
 }
 
 inline ntfs_inode *ntfs_new_extent_inode(struct super_block *sb,
diff -Nru a/fs/ntfs/inode.h b/fs/ntfs/inode.h
--- a/fs/ntfs/inode.h	2004-10-19 10:14:59 +01:00
+++ b/fs/ntfs/inode.h	2004-10-19 10:14:59 +01:00
@@ -35,6 +35,7 @@
 #include "volume.h"
 #include "types.h"
 #include "runlist.h"
+#include "debug.h"
 
 typedef struct _ntfs_inode ntfs_inode;
 
@@ -275,6 +276,17 @@
 extern struct inode *ntfs_alloc_big_inode(struct super_block *sb);
 extern void ntfs_destroy_big_inode(struct inode *inode);
 extern void ntfs_clear_big_inode(struct inode *vi);
+
+extern void __ntfs_init_inode(struct super_block *sb, ntfs_inode *ni);
+
+static inline void ntfs_init_big_inode(struct inode *vi)
+{
+	ntfs_inode *ni = NTFS_I(vi);
+
+	ntfs_debug("Entering.");
+	__ntfs_init_inode(vi->i_sb, ni);
+	ni->mft_no = vi->i_ino;
+}
 
 extern ntfs_inode *ntfs_new_extent_inode(struct super_block *sb,
 		unsigned long mft_no);
diff -Nru a/fs/ntfs/logfile.c b/fs/ntfs/logfile.c
--- a/fs/ntfs/logfile.c	2004-10-19 10:14:59 +01:00
+++ b/fs/ntfs/logfile.c	2004-10-19 10:14:59 +01:00
@@ -27,11 +27,12 @@
 #include <linux/buffer_head.h>
 #include <linux/bitops.h>
 
-#include "logfile.h"
-#include "volume.h"
+#include "attrib.h"
 #include "aops.h"
 #include "debug.h"
+#include "logfile.h"
 #include "malloc.h"
+#include "volume.h"
 #include "ntfs.h"
 
 /**
diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:14:59 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:14:59 +01:00
@@ -23,12 +23,14 @@
 #include <linux/buffer_head.h>
 #include <linux/swap.h>
 
-#include "bitmap.h"
-#include "lcnalloc.h"
+#include "attrib.h"
 #include "aops.h"
+#include "bitmap.h"
 #include "debug.h"
-#include "mft.h"
+#include "dir.h"
+#include "lcnalloc.h"
 #include "malloc.h"
+#include "mft.h"
 #include "ntfs.h"
 
 /**
diff -Nru a/fs/ntfs/namei.c b/fs/ntfs/namei.c
--- a/fs/ntfs/namei.c	2004-10-19 10:14:59 +01:00
+++ b/fs/ntfs/namei.c	2004-10-19 10:14:59 +01:00
@@ -23,8 +23,9 @@
 #include <linux/dcache.h>
 #include <linux/security.h>
 
-#include "dir.h"
+#include "attrib.h"
 #include "debug.h"
+#include "dir.h"
 #include "mft.h"
 #include "ntfs.h"
 
diff -Nru a/fs/ntfs/runlist.c b/fs/ntfs/runlist.c
--- a/fs/ntfs/runlist.c	2004-10-19 10:14:59 +01:00
+++ b/fs/ntfs/runlist.c	2004-10-19 10:14:59 +01:00
@@ -20,8 +20,9 @@
  * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
 
-#include "dir.h"
 #include "debug.h"
+#include "dir.h"
+#include "endian.h"
 #include "malloc.h"
 #include "ntfs.h"
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 32/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:46                                                             ` [PATCH 31/37] " Anton Altaparmakov
@ 2004-10-19  9:46                                                               ` Anton Altaparmakov
  2004-10-19  9:46                                                                 ` [PATCH 33/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 32/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/15 1.2052)
   NTFS: Simplify setup of i_mode in fs/ntfs/inode.c::ntfs_read_locked_inode().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:15:02 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:15:02 +01:00
@@ -132,6 +132,7 @@
 	  inode.h and make fs/ntfs/inode.c::__ntfs_init_inode() non-static and
 	  add a declaration for it to inode.h.  Fix some compilation issues
 	  that resulted due to #includes and header file interdependencies.
+	- Simplify setup of i_mode in fs/ntfs/inode.c::ntfs_read_locked_inode().
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/inode.c b/fs/ntfs/inode.c
--- a/fs/ntfs/inode.c	2004-10-19 10:15:02 +01:00
+++ b/fs/ntfs/inode.c	2004-10-19 10:15:02 +01:00
@@ -594,14 +594,26 @@
 	 * Also if not a directory, it could be something else, rather than
 	 * a regular file. But again, will do for now.
 	 */
+	/* Everyone gets all permissions. */
+	vi->i_mode |= S_IRWXUGO;
+	/* If read-only, noone gets write permissions. */
+	if (IS_RDONLY(vi))
+		vi->i_mode &= ~S_IWUGO;
 	if (m->flags & MFT_RECORD_IS_DIRECTORY) {
 		vi->i_mode |= S_IFDIR;
+		/*
+		 * Apply the directory permissions mask set in the mount
+		 * options.
+		 */
+		vi->i_mode &= ~vol->dmask;
 		/* Things break without this kludge! */
 		if (vi->i_nlink > 1)
 			vi->i_nlink = 1;
-	} else
+	} else {
 		vi->i_mode |= S_IFREG;
-
+		/* Apply the file permissions mask set in the mount options. */
+		vi->i_mode &= ~vol->fmask;
+	}
 	/*
 	 * Find the standard information attribute in the mft record. At this
 	 * stage we haven't setup the attribute list stuff yet, so this could
@@ -944,16 +956,6 @@
 			goto unm_err_out;
 		}
 skip_large_dir_stuff:
-		/* Everyone gets read and scan permissions. */
-		vi->i_mode |= S_IRUGO | S_IXUGO;
-		/* If not read-only, set write permissions. */
-		if (!IS_RDONLY(vi))
-			vi->i_mode |= S_IWUGO;
-		/*
-		 * Apply the directory permissions mask set in the mount
-		 * options.
-		 */
-		vi->i_mode &= ~vol->dmask;
 		/* Setup the operations for this inode. */
 		vi->i_op = &ntfs_dir_inode_ops;
 		vi->i_fop = &ntfs_dir_ops;
@@ -1090,13 +1092,6 @@
 		unmap_mft_record(ni);
 		m = NULL;
 		ctx = NULL;
-		/* Everyone gets all permissions. */
-		vi->i_mode |= S_IRWXUGO;
-		/* If read-only, noone gets write permissions. */
-		if (IS_RDONLY(vi))
-			vi->i_mode &= ~S_IWUGO;
-		/* Apply the file permissions mask set in the mount options. */
-		vi->i_mode &= ~vol->fmask;
 		/* Setup the operations for this inode. */
 		vi->i_op = &ntfs_file_inode_ops;
 		vi->i_fop = &ntfs_file_ops;

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 33/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:46                                                               ` [PATCH 32/37] " Anton Altaparmakov
@ 2004-10-19  9:46                                                                 ` Anton Altaparmakov
  2004-10-19  9:47                                                                   ` [PATCH 34/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 33/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/15 1.2053)
   NTFS: Add helpers fs/ntfs/layout.h::MK_MREF() and MK_LE_MREF().
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:15:06 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:15:06 +01:00
@@ -133,6 +133,7 @@
 	  add a declaration for it to inode.h.  Fix some compilation issues
 	  that resulted due to #includes and header file interdependencies.
 	- Simplify setup of i_mode in fs/ntfs/inode.c::ntfs_read_locked_inode().
+	- Add helpers fs/ntfs/layout.h::MK_MREF() and MK_LE_MREF().
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/layout.h b/fs/ntfs/layout.h
--- a/fs/ntfs/layout.h	2004-10-19 10:15:06 +01:00
+++ b/fs/ntfs/layout.h	2004-10-19 10:15:06 +01:00
@@ -316,6 +316,10 @@
 typedef u64 MFT_REF;
 typedef le64 leMFT_REF;
 
+#define MK_MREF(m, s)	((MFT_REF)(((MFT_REF)(s) << 48) |		\
+					((MFT_REF)(m) & MFT_REF_MASK_CPU)))
+#define MK_LE_MREF(m, s) cpu_to_le64(MK_MREF(m, s))
+
 #define MREF(x)		((unsigned long)((x) & MFT_REF_MASK_CPU))
 #define MSEQNO(x)	((u16)(((x) >> 48) & 0xffff))
 #define MREF_LE(x)	((unsigned long)(le64_to_cpu(x) & MFT_REF_MASK_CPU))

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 34/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:46                                                                 ` [PATCH 33/37] " Anton Altaparmakov
@ 2004-10-19  9:47                                                                   ` Anton Altaparmakov
  2004-10-19  9:47                                                                     ` [PATCH 35/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 34/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/15 1.2054)
   NTFS: Modify fs/ntfs/mft.c::map_extent_mft_record() to only verify the mft
         record sequence number if it is specified (i.e. not zero).
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:15:10 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:15:10 +01:00
@@ -134,6 +134,8 @@
 	  that resulted due to #includes and header file interdependencies.
 	- Simplify setup of i_mode in fs/ntfs/inode.c::ntfs_read_locked_inode().
 	- Add helpers fs/ntfs/layout.h::MK_MREF() and MK_LE_MREF().
+	- Modify fs/ntfs/mft.c::map_extent_mft_record() to only verify the mft
+	  record sequence number if it is specified (i.e. not zero).
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:15:10 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:15:10 +01:00
@@ -302,8 +302,8 @@
 		ntfs_clear_extent_inode(ni);
 		goto map_err_out;
 	}
-	/* Verify the sequence number. */
-	if (unlikely(le16_to_cpu(m->sequence_number) != seq_no)) {
+	/* Verify the sequence number if it is present. */
+	if (seq_no && (le16_to_cpu(m->sequence_number) != seq_no)) {
 		ntfs_error(base_ni->vol->sb, "Found stale extent mft "
 				"reference! Corrupt file system. Run chkdsk.");
 		destroy_ni = TRUE;

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 35/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:47                                                                   ` [PATCH 34/37] " Anton Altaparmakov
@ 2004-10-19  9:47                                                                     ` Anton Altaparmakov
  2004-10-19  9:47                                                                       ` [PATCH 36/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 35/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/16 1.2055)
   NTFS: Add fs/ntfs/mft.[hc]::ntfs_mft_record_alloc() and various helper
         functions used by it.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:15:13 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:15:13 +01:00
@@ -136,6 +136,8 @@
 	- Add helpers fs/ntfs/layout.h::MK_MREF() and MK_LE_MREF().
 	- Modify fs/ntfs/mft.c::map_extent_mft_record() to only verify the mft
 	  record sequence number if it is specified (i.e. not zero).
+	- Add fs/ntfs/mft.[hc]::ntfs_mft_record_alloc() and various helper
+	  functions used by it.
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 
diff -Nru a/fs/ntfs/mft.c b/fs/ntfs/mft.c
--- a/fs/ntfs/mft.c	2004-10-19 10:15:13 +01:00
+++ b/fs/ntfs/mft.c	2004-10-19 10:15:13 +01:00
@@ -996,6 +996,1579 @@
 		"chkdsk.";
 
 /**
+ * ntfs_mft_bitmap_find_and_alloc_free_rec_nolock - see name
+ * @vol:	volume on which to search for a free mft record
+ * @base_ni:	open base inode if allocating an extent mft record or NULL
+ *
+ * Search for a free mft record in the mft bitmap attribute on the ntfs volume
+ * @vol.
+ *
+ * If @base_ni is NULL start the search at the default allocator position.
+ *
+ * If @base_ni is not NULL start the search at the mft record after the base
+ * mft record @base_ni.
+ *
+ * Return the free mft record on success and -errno on error.  An error code of
+ * -ENOSPC means that there are no free mft records in the currently
+ * initialized mft bitmap.
+ *
+ * Locking: Caller must hold vol->mftbmp_lock for writing.
+ */
+static int ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(ntfs_volume *vol,
+		ntfs_inode *base_ni)
+{
+	s64 pass_end, ll, data_pos, pass_start, ofs, bit;
+	struct address_space *mftbmp_mapping;
+	u8 *buf, *byte;
+	struct page *page;
+	unsigned int page_ofs, size;
+	u8 pass, b;
+
+	ntfs_debug("Searching for free mft record in the currently "
+			"initialized mft bitmap.");
+	mftbmp_mapping = vol->mftbmp_ino->i_mapping;
+	/*
+	 * Set the end of the pass making sure we do not overflow the mft
+	 * bitmap.
+	 */
+	pass_end = NTFS_I(vol->mft_ino)->allocated_size >>
+			vol->mft_record_size_bits;
+	ll = NTFS_I(vol->mftbmp_ino)->initialized_size << 3;
+	if (pass_end > ll)
+		pass_end = ll;
+	pass = 1;
+	if (!base_ni)
+		data_pos = vol->mft_data_pos;
+	else
+		data_pos = base_ni->mft_no + 1;
+	if (data_pos < 24)
+		data_pos = 24;
+	if (data_pos >= pass_end) {
+		data_pos = 24;
+		pass = 2;
+		/* This happens on a freshly formatted volume. */
+		if (data_pos >= pass_end)
+			return -ENOSPC;
+	}
+	pass_start = data_pos;
+	ntfs_debug("Starting bitmap search: pass %u, pass_start 0x%llx, "
+			"pass_end 0x%llx, data_pos 0x%llx.", pass,
+			(long long)pass_start, (long long)pass_end,
+			(long long)data_pos);
+	/* Loop until a free mft record is found. */
+	for (; pass <= 2;) {
+		/* Cap size to pass_end. */
+		ofs = data_pos >> 3;
+		page_ofs = ofs & ~PAGE_CACHE_MASK;
+		size = PAGE_CACHE_SIZE - page_ofs;
+		ll = ((pass_end + 7) >> 3) - ofs;
+		if (size > ll)
+			size = ll;
+		size <<= 3;
+		/*
+		 * If we are still within the active pass, search the next page
+		 * for a zero bit.
+		 */
+		if (size) {
+			page = ntfs_map_page(mftbmp_mapping,
+					ofs >> PAGE_CACHE_SHIFT);
+			if (unlikely(IS_ERR(page))) {
+				ntfs_error(vol->sb, "Failed to read mft "
+						"bitmap, aborting.");
+				return PTR_ERR(page);
+			}
+			buf = (u8*)page_address(page) + page_ofs;
+			bit = data_pos & 7;
+			data_pos &= ~7ull;
+			ntfs_debug("Before inner for loop: size 0x%x, "
+					"data_pos 0x%llx, bit 0x%llx", size,
+					(long long)data_pos, (long long)bit);
+			for (; bit < size && data_pos + bit < pass_end;
+					bit &= ~7ull, bit += 8) {
+				byte = buf + (bit >> 3);
+				if (*byte == 0xff)
+					continue;
+				b = ffz((unsigned long)*byte);
+				if (b < 8 && b >= (bit & 7)) {
+					ll = data_pos + (bit & ~7ull) + b;
+					if (unlikely(ll > (1ll << 32))) {
+						ntfs_unmap_page(page);
+						return -ENOSPC;
+					}
+					*byte |= 1 << b;
+					flush_dcache_page(page);
+					set_page_dirty(page);
+					ntfs_unmap_page(page);
+					ntfs_debug("Done.  (Found and "
+							"allocated mft record "
+							"0x%llx.)",
+							(long long)ll);
+					return ll;
+				}
+			}
+			ntfs_debug("After inner for loop: size 0x%x, "
+					"data_pos 0x%llx, bit 0x%llx", size,
+					(long long)data_pos, (long long)bit);
+			data_pos += size;
+			ntfs_unmap_page(page);
+			/*
+			 * If the end of the pass has not been reached yet,
+			 * continue searching the mft bitmap for a zero bit.
+			 */
+			if (data_pos < pass_end)
+				continue;
+		}
+		/* Do the next pass. */
+		if (++pass == 2) {
+			/*
+			 * Starting the second pass, in which we scan the first
+			 * part of the zone which we omitted earlier.
+			 */
+			pass_end = pass_start;
+			data_pos = pass_start = 24;
+			ntfs_debug("pass %i, pass_start 0x%llx, pass_end "
+					"0x%llx.", pass, (long long)pass_start,
+					(long long)pass_end);
+			if (data_pos >= pass_end)
+				break;
+		}
+	}
+	/* No free mft records in currently initialized mft bitmap. */
+	ntfs_debug("Done.  (No free mft records left in currently initialized "
+			"mft bitmap.)");
+	return -ENOSPC;
+}
+
+/**
+ * ntfs_mft_bitmap_extend_allocation_nolock - extend mft bitmap by a cluster
+ * @vol:	volume on which to extend the mft bitmap attribute
+ *
+ * Extend the mft bitmap attribute on the ntfs volume @vol by one cluster.
+ *
+ * Note: Only changes allocated_size, i.e. does not touch initialized_size or
+ * data_size.
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Locking: - Caller must hold vol->mftbmp_lock for writing.
+ *	    - This function takes NTFS_I(vol->mftbmp_ino)->runlist.lock for
+ *	      writing and releases it before returning.
+ *	    - This function takes vol->lcnbmp_lock for writing and releases it
+ *	      before returning.
+ */
+static int ntfs_mft_bitmap_extend_allocation_nolock(ntfs_volume *vol)
+{
+	LCN lcn;
+	s64 ll;
+	struct page *page;
+	ntfs_inode *mft_ni, *mftbmp_ni;
+	runlist_element *rl, *rl2 = NULL;
+	ntfs_attr_search_ctx *ctx = NULL;
+	MFT_RECORD *mrec;
+	ATTR_RECORD *a = NULL;
+	int ret, mp_size;
+	u32 old_alen = 0;
+	u8 *b, tb;
+	struct {
+		u8 added_cluster:1;
+		u8 added_run:1;
+		u8 mp_rebuilt:1;
+	} status = { 0, 0, 0 };
+
+	ntfs_debug("Extending mft bitmap allocation.");
+	mft_ni = NTFS_I(vol->mft_ino);
+	mftbmp_ni = NTFS_I(vol->mftbmp_ino);
+	/*
+	 * Determine the last lcn of the mft bitmap.  The allocated size of the
+	 * mft bitmap cannot be zero so we are ok to do this.
+	 * ntfs_find_vcn() returns the runlist locked on success.
+	 */
+	rl = ntfs_find_vcn(mftbmp_ni, (mftbmp_ni->allocated_size - 1) >>
+			vol->cluster_size_bits, TRUE);
+	if (unlikely(IS_ERR(rl) || !rl->length || rl->lcn < 0)) {
+		ntfs_error(vol->sb, "Failed to determine last allocated "
+				"cluster of mft bitmap attribute.");
+		if (!IS_ERR(rl)) {
+			up_write(&mftbmp_ni->runlist.lock);
+			ret = -EIO;
+		} else
+			ret = PTR_ERR(rl);
+		return ret;
+	}
+	lcn = rl->lcn + rl->length;
+	ntfs_debug("Last lcn of mft bitmap attribute is 0x%llx.",
+			(long long)lcn);
+	/*
+	 * Attempt to get the cluster following the last allocated cluster by
+	 * hand as it may be in the MFT zone so the allocator would not give it
+	 * to us.
+	 */
+	ll = lcn >> 3;
+	page = ntfs_map_page(vol->lcnbmp_ino->i_mapping,
+			ll >> PAGE_CACHE_SHIFT);
+	if (IS_ERR(page)) {
+		up_write(&mftbmp_ni->runlist.lock);
+		ntfs_error(vol->sb, "Failed to read from lcn bitmap.");
+		return PTR_ERR(page);
+	}
+	b = (u8*)page_address(page) + (ll & ~PAGE_CACHE_MASK);
+	tb = 1 << (lcn & 7ull);
+	down_write(&vol->lcnbmp_lock);
+	if (*b != 0xff && !(*b & tb)) {
+		/* Next cluster is free, allocate it. */
+		*b |= tb;
+		flush_dcache_page(page);
+		set_page_dirty(page);
+		up_write(&vol->lcnbmp_lock);
+		ntfs_unmap_page(page);
+		/* Update the mft bitmap runlist. */
+		rl->length++;
+		rl[1].vcn++;
+		status.added_cluster = 1;
+		ntfs_debug("Appending one cluster to mft bitmap.");
+	} else {
+		up_write(&vol->lcnbmp_lock);
+		ntfs_unmap_page(page);
+		/* Allocate a cluster from the DATA_ZONE. */
+		rl2 = ntfs_cluster_alloc(vol, rl[1].vcn, 1, lcn, DATA_ZONE);
+		if (IS_ERR(rl2)) {
+			up_write(&mftbmp_ni->runlist.lock);
+			ntfs_error(vol->sb, "Failed to allocate a cluster for "
+					"the mft bitmap.");
+			return PTR_ERR(rl2);
+		}
+		rl = ntfs_runlists_merge(mftbmp_ni->runlist.rl, rl2);
+		if (IS_ERR(rl)) {
+			up_write(&mftbmp_ni->runlist.lock);
+			ntfs_error(vol->sb, "Failed to merge runlists for mft "
+					"bitmap.");
+			if (ntfs_cluster_free_from_rl(vol, rl2)) {
+				ntfs_error(vol->sb, "Failed to dealocate "
+						"allocated cluster.%s", es);
+				NVolSetErrors(vol);
+			}
+			ntfs_free(rl2);
+			return PTR_ERR(rl);
+		}
+		mftbmp_ni->runlist.rl = rl;
+		status.added_run = 1;
+		ntfs_debug("Adding one run to mft bitmap.");
+		/* Find the last run in the new runlist. */
+		for (; rl[1].length; rl++)
+			;
+	}
+	/*
+	 * Update the attribute record as well.  Note: @rl is the last
+	 * (non-terminator) runlist element of mft bitmap.
+	 */
+	mrec = map_mft_record(mft_ni);
+	if (IS_ERR(mrec)) {
+		ntfs_error(vol->sb, "Failed to map mft record.");
+		ret = PTR_ERR(mrec);
+		goto undo_alloc;
+	}
+	ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.");
+		ret = -ENOMEM;
+		goto undo_alloc;
+	}
+	ret = ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+			mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL,
+			0, ctx);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb, "Failed to find last attribute extent of "
+				"mft bitmap attribute.");
+		if (ret == -ENOENT)
+			ret = -EIO;
+		goto undo_alloc;
+	}
+	a = ctx->attr;
+	ll = sle64_to_cpu(a->data.non_resident.lowest_vcn);
+	/* Search back for the previous last allocated cluster of mft bitmap. */
+	for (rl2 = rl; rl2 > mftbmp_ni->runlist.rl; rl2--) {
+		if (ll >= rl2->vcn)
+			break;
+	}
+	BUG_ON(ll < rl2->vcn);
+	BUG_ON(ll >= rl2->vcn + rl2->length);
+	/* Get the size for the new mapping pairs array for this extent. */
+	mp_size = ntfs_get_size_for_mapping_pairs(vol, rl2, ll);
+	if (unlikely(mp_size <= 0)) {
+		ntfs_error(vol->sb, "Get size for mapping pairs failed for "
+				"mft bitmap attribute extent.");
+		ret = mp_size;
+		if (!ret)
+			ret = -EIO;
+		goto undo_alloc;
+	}
+	/* Expand the attribute record if necessary. */
+	old_alen = le32_to_cpu(a->length);
+	ret = ntfs_attr_record_resize(ctx->mrec, a, mp_size +
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
+	if (unlikely(ret)) {
+		if (ret != -ENOSPC) {
+			ntfs_error(vol->sb, "Failed to resize attribute "
+					"record for mft bitmap attribute.");
+			goto undo_alloc;
+		}
+		// TODO: Deal with this by moving this extent to a new mft
+		// record or by starting a new extent in a new mft record or by
+		// moving other attributes out of this mft record.
+		ntfs_error(vol->sb, "Not enough space in this mft record to "
+				"accomodate extended mft bitmap attribute "
+				"extent.  Cannot handle this yet.");
+		ret = -EOPNOTSUPP;
+		goto undo_alloc;
+	}
+	status.mp_rebuilt = 1;
+	/* Generate the mapping pairs array directly into the attr record. */
+	ret = ntfs_mapping_pairs_build(vol, (u8*)a +
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
+			mp_size, rl2, ll, NULL);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb, "Failed to build mapping pairs array for "
+				"mft bitmap attribute.");
+		goto undo_alloc;
+	}
+	/* Update the highest_vcn. */
+	a->data.non_resident.highest_vcn = cpu_to_sle64(rl[1].vcn - 1);
+	/*
+	 * We now have extended the mft bitmap allocated_size by one cluster.
+	 * Reflect this in the ntfs_inode structure and the attribute record.
+	 */
+	if (a->data.non_resident.lowest_vcn) {
+		/*
+		 * We are not in the first attribute extent, switch to it, but
+		 * first ensure the changes will make it to disk later.
+		 */
+		flush_dcache_mft_record_page(ctx->ntfs_ino);
+		mark_mft_record_dirty(ctx->ntfs_ino);
+		ntfs_attr_reinit_search_ctx(ctx);
+		ret = ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+				mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL,
+				0, ctx);
+		if (unlikely(ret)) {
+			ntfs_error(vol->sb, "Failed to find first attribute "
+					"extent of mft bitmap attribute.");
+			goto restore_undo_alloc;
+		}
+		a = ctx->attr;
+	}
+	mftbmp_ni->allocated_size += vol->cluster_size;
+	a->data.non_resident.allocated_size =
+			cpu_to_sle64(mftbmp_ni->allocated_size);
+	/* Ensure the changes make it to disk. */
+	flush_dcache_mft_record_page(ctx->ntfs_ino);
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+	up_write(&mftbmp_ni->runlist.lock);
+	ntfs_debug("Done.");
+	return 0;
+restore_undo_alloc:
+	ntfs_attr_reinit_search_ctx(ctx);
+	if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+			mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL,
+			0, ctx)) {
+		ntfs_error(vol->sb, "Failed to find last attribute extent of "
+				"mft bitmap attribute.%s", es);
+		mftbmp_ni->allocated_size += vol->cluster_size;
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(mft_ni);
+		up_write(&mftbmp_ni->runlist.lock);
+		/*
+		 * The only thing that is now wrong is ->allocated_size of the
+		 * base attribute extent which chkdsk should be able to fix.
+		 */
+		NVolSetErrors(vol);
+		return ret;
+	}
+	a = ctx->attr;
+	a->data.non_resident.highest_vcn = cpu_to_sle64(rl[1].vcn - 2);
+undo_alloc:
+	if (status.added_cluster) {
+		/* Truncate the last run in the runlist by one cluster. */
+		rl->length--;
+		rl[1].vcn--;
+	} else if (status.added_run) {
+		lcn = rl->lcn;
+		/* Remove the last run from the runlist. */
+		rl->lcn = rl[1].lcn;
+		rl->length = 0;
+	}
+	/* Deallocate the cluster. */
+	down_write(&vol->lcnbmp_lock);
+	if (ntfs_bitmap_clear_bit(vol->lcnbmp_ino, lcn)) {
+		ntfs_error(vol->sb, "Failed to free allocated cluster.%s", es);
+		NVolSetErrors(vol);
+	}
+	up_write(&vol->lcnbmp_lock);
+	if (status.mp_rebuilt) {
+		if (ntfs_mapping_pairs_build(vol, (u8*)a + le16_to_cpu(
+				a->data.non_resident.mapping_pairs_offset),
+				old_alen - le16_to_cpu(
+				a->data.non_resident.mapping_pairs_offset),
+				rl2, ll, NULL)) {
+			ntfs_error(vol->sb, "Failed to restore mapping pairs "
+					"array.%s", es);
+			NVolSetErrors(vol);
+		}
+		if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
+			ntfs_error(vol->sb, "Failed to restore attribute "
+					"record.%s", es);
+			NVolSetErrors(vol);
+		}
+		flush_dcache_mft_record_page(ctx->ntfs_ino);
+		mark_mft_record_dirty(ctx->ntfs_ino);
+	}
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	if (!IS_ERR(mrec))
+		unmap_mft_record(mft_ni);
+	up_write(&mftbmp_ni->runlist.lock);
+	return ret;
+}
+
+/**
+ * ntfs_mft_bitmap_extend_initialized_nolock - extend mftbmp initialized data
+ * @vol:	volume on which to extend the mft bitmap attribute
+ *
+ * Extend the initialized portion of the mft bitmap attribute on the ntfs
+ * volume @vol by 8 bytes.
+ *
+ * Note:  Only changes initialized_size and data_size, i.e. requires that
+ * allocated_size is big enough to fit the new initialized_size.
+ *
+ * Return 0 on success and -error on error.
+ *
+ * Locking: Caller must hold vol->mftbmp_lock for writing.
+ */
+static int ntfs_mft_bitmap_extend_initialized_nolock(ntfs_volume *vol)
+{
+	s64 old_data_size, old_initialized_size;
+	struct inode *mftbmp_vi;
+	ntfs_inode *mft_ni, *mftbmp_ni;
+	ntfs_attr_search_ctx *ctx;
+	MFT_RECORD *mrec;
+	ATTR_RECORD *a;
+	int ret;
+
+	ntfs_debug("Extending mft bitmap initiailized (and data) size.");
+	mft_ni = NTFS_I(vol->mft_ino);
+	mftbmp_vi = vol->mftbmp_ino;
+	mftbmp_ni = NTFS_I(mftbmp_vi);
+	/* Get the attribute record. */
+	mrec = map_mft_record(mft_ni);
+	if (IS_ERR(mrec)) {
+		ntfs_error(vol->sb, "Failed to map mft record.");
+		return PTR_ERR(mrec);
+	}
+	ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.");
+		ret = -ENOMEM;
+		goto unm_err_out;
+	}
+	ret = ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+			mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb, "Failed to find first attribute extent of "
+				"mft bitmap attribute.");
+		if (ret == -ENOENT)
+			ret = -EIO;
+		goto put_err_out;
+	}
+	a = ctx->attr;
+	old_data_size = mftbmp_vi->i_size;
+	old_initialized_size = mftbmp_ni->initialized_size;
+	/*
+	 * We can simply update the initialized_size before filling the space
+	 * with zeroes because the caller is holding the mft bitmap lock for
+	 * writing which ensures that no one else is trying to access the data.
+	 */
+	mftbmp_ni->initialized_size += 8;
+	a->data.non_resident.initialized_size =
+			cpu_to_sle64(mftbmp_ni->initialized_size);
+	if (mftbmp_ni->initialized_size > mftbmp_vi->i_size) {
+		mftbmp_vi->i_size = mftbmp_ni->initialized_size;
+		a->data.non_resident.data_size =
+				cpu_to_sle64(mftbmp_vi->i_size);
+	}
+	/* Ensure the changes make it to disk. */
+	flush_dcache_mft_record_page(ctx->ntfs_ino);
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+	/* Initialize the mft bitmap attribute value with zeroes. */
+	ret = ntfs_attr_set(mftbmp_ni, old_initialized_size, 8, 0);
+	if (likely(!ret)) {
+		ntfs_debug("Done.  (Wrote eight initialized bytes to mft "
+				"bitmap.");
+		return 0;
+	}
+	ntfs_error(vol->sb, "Failed to write to mft bitmap.");
+	/* Try to recover from the error. */
+	mrec = map_mft_record(mft_ni);
+	if (IS_ERR(mrec)) {
+		ntfs_error(vol->sb, "Failed to map mft record.%s", es);
+		NVolSetErrors(vol);
+		return ret;
+	}
+	ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.%s", es);
+		NVolSetErrors(vol);
+		goto unm_err_out;
+	}
+	if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
+			mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx)) {
+		ntfs_error(vol->sb, "Failed to find first attribute extent of "
+				"mft bitmap attribute.%s", es);
+		NVolSetErrors(vol);
+put_err_out:
+		ntfs_attr_put_search_ctx(ctx);
+unm_err_out:
+		unmap_mft_record(mft_ni);
+		goto err_out;
+	}
+	a = ctx->attr;
+	mftbmp_ni->initialized_size = old_initialized_size;
+	a->data.non_resident.initialized_size =
+			cpu_to_sle64(old_initialized_size);
+	if (mftbmp_vi->i_size != old_data_size) {
+		mftbmp_vi->i_size = old_data_size;
+		a->data.non_resident.data_size = cpu_to_sle64(old_data_size);
+	}
+	flush_dcache_mft_record_page(ctx->ntfs_ino);
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+	ntfs_debug("Restored status of mftbmp: allocated_size 0x%llx, "
+			"data_size 0x%llx, initialized_size 0x%llx.",
+			(long long)mftbmp_ni->allocated_size,
+			(long long)mftbmp_vi->i_size,
+			(long long)mftbmp_ni->initialized_size);
+err_out:
+	return ret;
+}
+
+/**
+ * ntfs_mft_data_extend_allocation_nolock - extend mft data attribute
+ * @vol:	volume on which to extend the mft data attribute
+ *
+ * Extend the mft data attribute on the ntfs volume @vol by 16 mft records
+ * worth of clusters or if not enough space for this by one mft record worth
+ * of clusters.
+ *
+ * Note:  Only changes allocated_size, i.e. does not touch initialized_size or
+ * data_size.
+ *
+ * Return 0 on success and -errno on error.
+ *
+ * Locking: - Caller must hold vol->mftbmp_lock for writing.
+ *	    - This function takes NTFS_I(vol->mft_ino)->runlist.lock for
+ *	      writing and releases it before returning.
+ *	    - This function calls functions which take vol->lcnbmp_lock for
+ *	      writing and release it before returning.
+ */
+static int ntfs_mft_data_extend_allocation_nolock(ntfs_volume *vol)
+{
+	LCN lcn;
+	VCN old_last_vcn;
+	s64 min_nr, nr, ll = 0;
+	ntfs_inode *mft_ni;
+	runlist_element *rl, *rl2;
+	ntfs_attr_search_ctx *ctx = NULL;
+	MFT_RECORD *mrec;
+	ATTR_RECORD *a = NULL;
+	int ret, mp_size;
+	u32 old_alen = 0;
+	BOOL mp_rebuilt = FALSE;
+
+	ntfs_debug("Extending mft data allocation.");
+	mft_ni = NTFS_I(vol->mft_ino);
+	/*
+	 * Determine the preferred allocation location, i.e. the last lcn of
+	 * the mft data attribute.  The allocated size of the mft data
+	 * attribute cannot be zero so we are ok to do this.
+	 * ntfs_find_vcn() returns the runlist locked on success.
+	 */
+	rl = ntfs_find_vcn(mft_ni, (mft_ni->allocated_size - 1) >>
+			vol->cluster_size_bits, TRUE);
+	if (unlikely(IS_ERR(rl) || !rl->length || rl->lcn < 0)) {
+		ntfs_error(vol->sb, "Failed to determine last allocated "
+				"cluster of mft data attribute.");
+		if (!IS_ERR(rl)) {
+			up_write(&mft_ni->runlist.lock);
+			ret = -EIO;
+		} else
+			ret = PTR_ERR(rl);
+		return ret;
+	}
+	lcn = rl->lcn + rl->length;
+	ntfs_debug("Last lcn of mft data attribute is 0x%llx.",
+			(long long)lcn);
+	/* Minimum allocation is one mft record worth of clusters. */
+	min_nr = vol->mft_record_size >> vol->cluster_size_bits;
+	if (!min_nr)
+		min_nr = 1;
+	/* Want to allocate 16 mft records worth of clusters. */
+	nr = vol->mft_record_size << 4 >> vol->cluster_size_bits;
+	if (!nr)
+		nr = min_nr;
+	/* Ensure we do not go above 2^32-1 mft records. */
+	if (unlikely((mft_ni->allocated_size +
+			(nr << vol->cluster_size_bits)) >>
+			vol->mft_record_size_bits >= (1ll << 32))) {
+		nr = min_nr;
+		if (unlikely((mft_ni->allocated_size +
+				(nr << vol->cluster_size_bits)) >>
+				vol->mft_record_size_bits >= (1ll << 32))) {
+			ntfs_warning(vol->sb, "Cannot allocate mft record "
+					"because the maximum number of inodes "
+					"(2^32) has already been reached.");
+			up_write(&mft_ni->runlist.lock);
+			return -ENOSPC;
+		}
+	}
+	ntfs_debug("Trying mft data allocation with %s cluster count %lli.",
+			nr > min_nr ? "default" : "minimal", (long long)nr);
+	old_last_vcn = rl[1].vcn;
+	do {
+		rl2 = ntfs_cluster_alloc(vol, old_last_vcn, nr, lcn, MFT_ZONE);
+		if (likely(!IS_ERR(rl2)))
+			break;
+		if (PTR_ERR(rl2) != -ENOSPC || nr == min_nr) {
+			ntfs_error(vol->sb, "Failed to allocate the minimal "
+					"number of clusters (%lli) for the "
+					"mft data attribute.", (long long)nr);
+			up_write(&mft_ni->runlist.lock);
+			return PTR_ERR(rl2);
+		}
+		/*
+		 * There is not enough space to do the allocation, but there
+		 * might be enough space to do a minimal allocation so try that
+		 * before failing.
+		 */
+		nr = min_nr;
+		ntfs_debug("Retrying mft data allocation with minimal cluster "
+				"count %lli.", (long long)nr);
+	} while (1);
+	rl = ntfs_runlists_merge(mft_ni->runlist.rl, rl2);
+	if (IS_ERR(rl)) {
+		up_write(&mft_ni->runlist.lock);
+		ntfs_error(vol->sb, "Failed to merge runlists for mft data "
+				"attribute.");
+		if (ntfs_cluster_free_from_rl(vol, rl2)) {
+			ntfs_error(vol->sb, "Failed to dealocate clusters "
+					"from the mft data attribute.%s", es);
+			NVolSetErrors(vol);
+		}
+		ntfs_free(rl2);
+		return PTR_ERR(rl);
+	}
+	mft_ni->runlist.rl = rl;
+	ntfs_debug("Allocated %lli clusters.", nr);
+	/* Find the last run in the new runlist. */
+	for (; rl[1].length; rl++)
+		;
+	/* Update the attribute record as well. */
+	mrec = map_mft_record(mft_ni);
+	if (IS_ERR(mrec)) {
+		ntfs_error(vol->sb, "Failed to map mft record.");
+		ret = PTR_ERR(mrec);
+		goto undo_alloc;
+	}
+	ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.");
+		ret = -ENOMEM;
+		goto undo_alloc;
+	}
+	ret = ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
+			CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb, "Failed to find last attribute extent of "
+				"mft data attribute.");
+		if (ret == -ENOENT)
+			ret = -EIO;
+		goto undo_alloc;
+	}
+	a = ctx->attr;
+	ll = sle64_to_cpu(a->data.non_resident.lowest_vcn);
+	/* Search back for the previous last allocated cluster of mft bitmap. */
+	for (rl2 = rl; rl2 > mft_ni->runlist.rl; rl2--) {
+		if (ll >= rl2->vcn)
+			break;
+	}
+	BUG_ON(ll < rl2->vcn);
+	BUG_ON(ll >= rl2->vcn + rl2->length);
+	/* Get the size for the new mapping pairs array for this extent. */
+	mp_size = ntfs_get_size_for_mapping_pairs(vol, rl2, ll);
+	if (unlikely(mp_size <= 0)) {
+		ntfs_error(vol->sb, "Get size for mapping pairs failed for "
+				"mft data attribute extent.");
+		ret = mp_size;
+		if (!ret)
+			ret = -EIO;
+		goto undo_alloc;
+	}
+	/* Expand the attribute record if necessary. */
+	old_alen = le32_to_cpu(a->length);
+	ret = ntfs_attr_record_resize(ctx->mrec, a, mp_size +
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
+	if (unlikely(ret)) {
+		if (ret != -ENOSPC) {
+			ntfs_error(vol->sb, "Failed to resize attribute "
+					"record for mft data attribute.");
+			goto undo_alloc;
+		}
+		// TODO: Deal with this by moving this extent to a new mft
+		// record or by starting a new extent in a new mft record or by
+		// moving other attributes out of this mft record.
+		// Note: Use the special reserved mft records and ensure that
+		// this extent is not required to find the mft record in
+		// question.
+		ntfs_error(vol->sb, "Not enough space in this mft record to "
+				"accomodate extended mft data attribute "
+				"extent.  Cannot handle this yet.");
+		ret = -EOPNOTSUPP;
+		goto undo_alloc;
+	}
+	mp_rebuilt = TRUE;
+	/* Generate the mapping pairs array directly into the attr record. */
+	ret = ntfs_mapping_pairs_build(vol, (u8*)a +
+			le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
+			mp_size, rl2, ll, NULL);
+	if (unlikely(ret)) {
+		ntfs_error(vol->sb, "Failed to build mapping pairs array of "
+				"mft data attribute.");
+		goto undo_alloc;
+	}
+	/* Update the highest_vcn. */
+	a->data.non_resident.highest_vcn = cpu_to_sle64(rl[1].vcn - 1);
+	/*
+	 * We now have extended the mft data allocated_size by nr clusters.
+	 * Reflect this in the ntfs_inode structure and the attribute record.
+	 * @rl is the last (non-terminator) runlist element of mft data
+	 * attribute.
+	 */
+	if (a->data.non_resident.lowest_vcn) {
+		/*
+		 * We are not in the first attribute extent, switch to it, but
+		 * first ensure the changes will make it to disk later.
+		 */
+		flush_dcache_mft_record_page(ctx->ntfs_ino);
+		mark_mft_record_dirty(ctx->ntfs_ino);
+		ntfs_attr_reinit_search_ctx(ctx);
+		ret = ntfs_attr_lookup(mft_ni->type, mft_ni->name,
+				mft_ni->name_len, CASE_SENSITIVE, 0, NULL, 0,
+				ctx);
+		if (unlikely(ret)) {
+			ntfs_error(vol->sb, "Failed to find first attribute "
+					"extent of mft data attribute.");
+			goto restore_undo_alloc;
+		}
+		a = ctx->attr;
+	}
+	mft_ni->allocated_size += nr << vol->cluster_size_bits;
+	a->data.non_resident.allocated_size =
+			cpu_to_sle64(mft_ni->allocated_size);
+	/* Ensure the changes make it to disk. */
+	flush_dcache_mft_record_page(ctx->ntfs_ino);
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+	up_write(&mft_ni->runlist.lock);
+	ntfs_debug("Done.");
+	return 0;
+restore_undo_alloc:
+	ntfs_attr_reinit_search_ctx(ctx);
+	if (ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
+			CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx)) {
+		ntfs_error(vol->sb, "Failed to find last attribute extent of "
+				"mft data attribute.%s", es);
+		mft_ni->allocated_size += nr << vol->cluster_size_bits;
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(mft_ni);
+		up_write(&mft_ni->runlist.lock);
+		/*
+		 * The only thing that is now wrong is ->allocated_size of the
+		 * base attribute extent which chkdsk should be able to fix.
+		 */
+		NVolSetErrors(vol);
+		return ret;
+	}
+	a = ctx->attr;
+	a->data.non_resident.highest_vcn = cpu_to_sle64(old_last_vcn - 1);
+undo_alloc:
+	if (ntfs_cluster_free(vol->mft_ino, old_last_vcn, -1) < 0) {
+		ntfs_error(vol->sb, "Failed to free clusters from mft data "
+				"attribute.%s", es);
+		NVolSetErrors(vol);
+	}
+	if (ntfs_rl_truncate_nolock(vol, &mft_ni->runlist, old_last_vcn)) {
+		ntfs_error(vol->sb, "Failed to truncate mft data attribute "
+				"runlist.%s", es);
+		NVolSetErrors(vol);
+	}
+	if (mp_rebuilt) {
+		if (ntfs_mapping_pairs_build(vol, (u8*)a + le16_to_cpu(
+				a->data.non_resident.mapping_pairs_offset),
+				old_alen - le16_to_cpu(
+				a->data.non_resident.mapping_pairs_offset),
+				rl2, ll, NULL)) {
+			ntfs_error(vol->sb, "Failed to restore mapping pairs "
+					"array.%s", es);
+			NVolSetErrors(vol);
+		}
+		if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
+			ntfs_error(vol->sb, "Failed to restore attribute "
+					"record.%s", es);
+			NVolSetErrors(vol);
+		}
+		flush_dcache_mft_record_page(ctx->ntfs_ino);
+		mark_mft_record_dirty(ctx->ntfs_ino);
+	}
+	if (ctx)
+		ntfs_attr_put_search_ctx(ctx);
+	if (!IS_ERR(mrec))
+		unmap_mft_record(mft_ni);
+	up_write(&mft_ni->runlist.lock);
+	return ret;
+}
+
+/**
+ * ntfs_mft_record_layout - layout an mft record into a memory buffer
+ * @vol:	volume to which the mft record will belong
+ * @mft_no:	mft reference specifying the mft record number
+ * @m:		destination buffer of size >= @vol->mft_record_size bytes
+ *
+ * Layout an empty, unused mft record with the mft record number @mft_no into
+ * the buffer @m.  The volume @vol is needed because the mft record structure
+ * was modified in NTFS 3.1 so we need to know which volume version this mft
+ * record will be used on.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_mft_record_layout(const ntfs_volume *vol, const s64 mft_no,
+		MFT_RECORD *m)
+{
+	ATTR_RECORD *a;
+
+	ntfs_debug("Entering for mft record 0x%llx.", (long long)mft_no);
+	if (mft_no >= (1ll << 32)) {
+		ntfs_error(vol->sb, "Mft record number 0x%llx exceeds "
+				"maximum of 2^32.", (long long)mft_no);
+		return -ERANGE;
+	}
+	/* Start by clearing the whole mft record to gives us a clean slate. */
+	memset(m, 0, vol->mft_record_size);
+	/* Aligned to 2-byte boundary. */
+	if (vol->major_ver < 3 || (vol->major_ver == 3 && !vol->minor_ver))
+		m->usa_ofs = cpu_to_le16((sizeof(MFT_RECORD_OLD) + 1) & ~1);
+	else {
+		m->usa_ofs = cpu_to_le16((sizeof(MFT_RECORD) + 1) & ~1);
+		/*
+		 * Set the NTFS 3.1+ specific fields while we know that the
+		 * volume version is 3.1+.
+		 */
+		m->reserved = 0;
+		m->mft_record_number = cpu_to_le32((u32)mft_no);
+	}
+	m->magic = magic_FILE;
+	if (vol->mft_record_size >= NTFS_BLOCK_SIZE)
+		m->usa_count = cpu_to_le16(vol->mft_record_size /
+				NTFS_BLOCK_SIZE + 1);
+	else {
+		m->usa_count = cpu_to_le16(1);
+		ntfs_warning(vol->sb, "Sector size is bigger than mft record "
+				"size.  Setting usa_count to 1.  If chkdsk "
+				"reports this as corruption, please email "
+				"linux-ntfs-dev@lists.sourceforge.net stating "
+				"that you saw this message and that the "
+				"modified file system created was corrupt.  "
+				"Thank you.");
+	}
+	/* Set the update sequence number to 1. */
+	*(le16*)((u8*)m + le16_to_cpu(m->usa_ofs)) = cpu_to_le16(1);
+	m->lsn = 0;
+	m->sequence_number = cpu_to_le16(1);
+	m->link_count = 0;
+	/*
+	 * Place the attributes straight after the update sequence array,
+	 * aligned to 8-byte boundary.
+	 */
+	m->attrs_offset = cpu_to_le16((le16_to_cpu(m->usa_ofs) +
+			(le16_to_cpu(m->usa_count) << 1) + 7) & ~7);
+	m->flags = 0;
+	/*
+	 * Using attrs_offset plus eight bytes (for the termination attribute).
+	 * attrs_offset is already aligned to 8-byte boundary, so no need to
+	 * align again.
+	 */
+	m->bytes_in_use = cpu_to_le32(le16_to_cpu(m->attrs_offset) + 8);
+	m->bytes_allocated = cpu_to_le32(vol->mft_record_size);
+	m->base_mft_record = 0;
+	m->next_attr_instance = 0;
+	/* Add the termination attribute. */
+	a = (ATTR_RECORD*)((u8*)m + le16_to_cpu(m->attrs_offset));
+	a->type = AT_END;
+	a->length = 0;
+	ntfs_debug("Done.");
+	return 0;
+}
+
+/**
+ * ntfs_mft_record_format - format an mft record on an ntfs volume
+ * @vol:	volume on which to format the mft record
+ * @mft_no:	mft record number to format
+ *
+ * Format the mft record @mft_no in $MFT/$DATA, i.e. lay out an empty, unused
+ * mft record into the appropriate place of the mft data attribute.  This is
+ * used when extending the mft data attribute.
+ *
+ * Return 0 on success and -errno on error.
+ */
+static int ntfs_mft_record_format(const ntfs_volume *vol, const s64 mft_no)
+{
+	struct inode *mft_vi = vol->mft_ino;
+	struct page *page;
+	MFT_RECORD *m;
+	pgoff_t index, end_index;
+	unsigned int ofs;
+	int err;
+
+	ntfs_debug("Entering for mft record 0x%llx.", (long long)mft_no);
+	/*
+	 * The index into the page cache and the offset within the page cache
+	 * page of the wanted mft record.
+	 */
+	index = mft_no << vol->mft_record_size_bits >> PAGE_CACHE_SHIFT;
+	ofs = (mft_no << vol->mft_record_size_bits) & ~PAGE_CACHE_MASK;
+	/* The maximum valid index into the page cache for $MFT's data. */
+	end_index = mft_vi->i_size >> PAGE_CACHE_SHIFT;
+	if (unlikely(index >= end_index)) {
+		if (unlikely(index > end_index || ofs + vol->mft_record_size >=
+				(mft_vi->i_size & ~PAGE_CACHE_MASK))) {
+			ntfs_error(vol->sb, "Tried to format non-existing mft "
+					"record 0x%llx.", (long long)mft_no);
+			return -ENOENT;
+		}
+	}
+	/* Read, map, and pin the page containing the mft record. */
+	page = ntfs_map_page(mft_vi->i_mapping, index);
+	if (unlikely(IS_ERR(page))) {
+		ntfs_error(vol->sb, "Failed to map page containing mft record "
+				"to format 0x%llx.", (long long)mft_no);
+		return PTR_ERR(page);
+	}
+	lock_page(page);
+	BUG_ON(!PageUptodate(page));
+	ClearPageUptodate(page);
+	m = (MFT_RECORD*)((u8*)page_address(page) + ofs);
+	err = ntfs_mft_record_layout(vol, mft_no, m);
+	if (unlikely(err)) {
+		ntfs_error(vol->sb, "Failed to layout mft record 0x%llx.",
+				(long long)mft_no);
+		SetPageUptodate(page);
+		unlock_page(page);
+		ntfs_unmap_page(page);
+		return err;
+	}
+	flush_dcache_page(page);
+	SetPageUptodate(page);
+	unlock_page(page);
+	/*
+	 * Make sure the mft record is written out to disk.  We could use
+	 * ilookup5() to check if an inode is in icache and so on but this is
+	 * unnecessary as ntfs_writepage() will write the dirty record anyway.
+	 */
+	mark_ntfs_record_dirty(page, ofs);
+	ntfs_unmap_page(page);
+	ntfs_debug("Done.");
+	return 0;
+}
+
+/**
+ * ntfs_mft_record_alloc - allocate an mft record on an ntfs volume
+ * @vol:	[IN]  volume on which to allocate the mft record
+ * @mode:	[IN]  mode if want a file or directory, i.e. base inode or 0
+ * @base_ni:	[IN]  open base inode if allocating an extent mft record or NULL
+ * @mrec:	[OUT] on successful return this is the mapped mft record
+ *
+ * Allocate an mft record in $MFT/$DATA of an open ntfs volume @vol.
+ *
+ * If @base_ni is NULL make the mft record a base mft record, i.e. a file or
+ * direvctory inode, and allocate it at the default allocator position.  In
+ * this case @mode is the file mode as given to us by the caller.  We in
+ * particular use @mode to distinguish whether a file or a directory is being
+ * created (S_IFDIR(mode) and S_IFREG(mode), respectively).
+ *
+ * If @base_ni is not NULL make the allocated mft record an extent record,
+ * allocate it starting at the mft record after the base mft record and attach
+ * the allocated and opened ntfs inode to the base inode @base_ni.  In this
+ * case @mode must be 0 as it is meaningless for extent inodes.
+ *
+ * You need to check the return value with IS_ERR().  If false, the function
+ * was successful and the return value is the now opened ntfs inode of the
+ * allocated mft record.  *@mrec is then set to the allocated, mapped, pinned,
+ * and locked mft record.  If IS_ERR() is true, the function failed and the
+ * error code is obtained from PTR_ERR(return value).  *@mrec is undefined in
+ * this case.
+ *
+ * Allocation strategy:
+ *
+ * To find a free mft record, we scan the mft bitmap for a zero bit.  To
+ * optimize this we start scanning at the place specified by @base_ni or if
+ * @base_ni is NULL we start where we last stopped and we perform wrap around
+ * when we reach the end.  Note, we do not try to allocate mft records below
+ * number 24 because numbers 0 to 15 are the defined system files anyway and 16
+ * to 24 are special in that they are used for storing extension mft records
+ * for the $DATA attribute of $MFT.  This is required to avoid the possibility
+ * of creating a runlist with a circular dependency which once written to disk
+ * can never be read in again.  Windows will only use records 16 to 24 for
+ * normal files if the volume is completely out of space.  We never use them
+ * which means that when the volume is really out of space we cannot create any
+ * more files while Windows can still create up to 8 small files.  We can start
+ * doing this at some later time, it does not matter much for now.
+ *
+ * When scanning the mft bitmap, we only search up to the last allocated mft
+ * record.  If there are no free records left in the range 24 to number of
+ * allocated mft records, then we extend the $MFT/$DATA attribute in order to
+ * create free mft records.  We extend the allocated size of $MFT/$DATA by 16
+ * records at a time or one cluster, if cluster size is above 16kiB.  If there
+ * is not sufficient space to do this, we try to extend by a single mft record
+ * or one cluster, if cluster size is above the mft record size.
+ *
+ * No matter how many mft records we allocate, we initialize only the first
+ * allocated mft record, incrementing mft data size and initialized size
+ * accordingly, open an ntfs_inode for it and return it to the caller, unless
+ * there are less than 24 mft records, in which case we allocate and initialize
+ * mft records until we reach record 24 which we consider as the first free mft
+ * record for use by normal files.
+ *
+ * If during any stage we overflow the initialized data in the mft bitmap, we
+ * extend the initialized size (and data size) by 8 bytes, allocating another
+ * cluster if required.  The bitmap data size has to be at least equal to the
+ * number of mft records in the mft, but it can be bigger, in which case the
+ * superflous bits are padded with zeroes.
+ *
+ * Thus, when we return successfully (IS_ERR() is false), we will have:
+ *	- initialized / extended the mft bitmap if necessary,
+ *	- initialized / extended the mft data if necessary,
+ *	- set the bit corresponding to the mft record being allocated in the
+ *	  mft bitmap,
+ *	- opened an ntfs_inode for the allocated mft record, and we will have
+ *	- returned the ntfs_inode as well as the allocated mapped, pinned, and
+ *	  locked mft record.
+ *
+ * On error, the volume will be left in a consistent state and no record will
+ * be allocated.  If rolling back a partial operation fails, we may leave some
+ * inconsistent metadata in which case we set NVolErrors() so the volume is
+ * left dirty when unmounted.
+ *
+ * Note, this function cannot make use of most of the normal functions, like
+ * for example for attribute resizing, etc, because when the run list overflows
+ * the base mft record and an attribute list is used, it is very important that
+ * the extension mft records used to store the $DATA attribute of $MFT can be
+ * reached without having to read the information contained inside them, as
+ * this would make it impossible to find them in the first place after the
+ * volume is unmounted.  $MFT/$BITMAP probably does not need to follow this
+ * rule because the bitmap is not essential for finding the mft records, but on
+ * the other hand, handling the bitmap in this special way would make life
+ * easier because otherwise there might be circular invocations of functions
+ * when reading the bitmap.
+ */
+ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol, const int mode,
+		ntfs_inode *base_ni, MFT_RECORD **mrec)
+{
+	s64 ll, bit, old_data_initialized, old_data_size;
+	struct inode *vi;
+	struct page *page;
+	ntfs_inode *mft_ni, *mftbmp_ni, *ni;
+	ntfs_attr_search_ctx *ctx;
+	MFT_RECORD *m;
+	ATTR_RECORD *a;
+	pgoff_t index;
+	unsigned int ofs;
+	int err;
+	le16 seq_no, usn;
+	BOOL record_formatted = FALSE;
+
+	if (base_ni) {
+		ntfs_debug("Entering (allocating an extent mft record for "
+				"base mft record 0x%llx).",
+				(long long)base_ni->mft_no);
+		/* @mode and @base_ni are mutually exclusive. */
+		BUG_ON(mode);
+	} else
+		ntfs_debug("Entering (allocating a base mft record).");
+	if (mode) {
+		/* @mode and @base_ni are mutually exclusive. */
+		BUG_ON(base_ni);
+		/* We only support creation of normal files and directories. */
+		if (!S_ISREG(mode) && !S_ISDIR(mode))
+			return ERR_PTR(-EOPNOTSUPP);
+	}
+	BUG_ON(!mrec);
+	mft_ni = NTFS_I(vol->mft_ino);
+	mftbmp_ni = NTFS_I(vol->mftbmp_ino);
+	down_write(&vol->mftbmp_lock);
+	bit = ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(vol, base_ni);
+	if (bit >= 0) {
+		ntfs_debug("Found and allocated free record (#1), bit 0x%llx.",
+				(long long)bit);
+		goto have_alloc_rec;
+	}
+	if (bit != -ENOSPC) {
+		up_write(&vol->mftbmp_lock);
+		return ERR_PTR(bit);
+	}
+	/*
+	 * No free mft records left.  If the mft bitmap already covers more
+	 * than the currently used mft records, the next records are all free,
+	 * so we can simply allocate the first unused mft record.
+	 * Note: We also have to make sure that the mft bitmap at least covers
+	 * the first 24 mft records as they are special and whilst they may not
+	 * be in use, we do not allocate from them.
+	 */
+	ll = mft_ni->initialized_size >> vol->mft_record_size_bits;
+	if (mftbmp_ni->initialized_size << 3 > ll &&
+			mftbmp_ni->initialized_size > 3) {
+		bit = ll;
+		if (bit < 24)
+			bit = 24;
+		if (unlikely(bit >= (1ll << 32)))
+			goto max_err_out;
+		ntfs_debug("Found free record (#2), bit 0x%llx.",
+				(long long)bit);
+		goto found_free_rec;
+	}
+	/*
+	 * The mft bitmap needs to be expanded until it covers the first unused
+	 * mft record that we can allocate.
+	 * Note: The smallest mft record we allocate is mft record 24.
+	 */
+	bit = mftbmp_ni->initialized_size << 3;
+	if (unlikely(bit >= (1ll << 32)))
+		goto max_err_out;
+	ntfs_debug("Status of mftbmp before extension: allocated_size 0x%llx, "
+			"data_size 0x%llx, initialized_size 0x%llx.",
+			(long long)mftbmp_ni->allocated_size,
+			(long long)vol->mftbmp_ino->i_size,
+			(long long)mftbmp_ni->initialized_size);
+	if (mftbmp_ni->initialized_size + 8 > mftbmp_ni->allocated_size) {
+		/* Need to extend bitmap by one more cluster. */
+		ntfs_debug("mftbmp: initialized_size + 8 > allocated_size.");
+		err = ntfs_mft_bitmap_extend_allocation_nolock(vol);
+		if (unlikely(err)) {
+			up_write(&vol->mftbmp_lock);
+			goto err_out;
+		}
+		ntfs_debug("Status of mftbmp after allocation extension: "
+				"allocated_size 0x%llx, data_size 0x%llx, "
+				"initialized_size 0x%llx.",
+				(long long)mftbmp_ni->allocated_size,
+				(long long)vol->mftbmp_ino->i_size,
+				(long long)mftbmp_ni->initialized_size);
+	}
+	/*
+	 * We now have sufficient allocated space, extend the initialized_size
+	 * as well as the data_size if necessary and fill the new space with
+	 * zeroes.
+	 */
+	err = ntfs_mft_bitmap_extend_initialized_nolock(vol);
+	if (unlikely(err)) {
+		up_write(&vol->mftbmp_lock);
+		goto err_out;
+	}
+	ntfs_debug("Status of mftbmp after initialized extention: "
+			"allocated_size 0x%llx, data_size 0x%llx, "
+			"initialized_size 0x%llx.",
+			(long long)mftbmp_ni->allocated_size,
+			(long long)vol->mftbmp_ino->i_size,
+			(long long)mftbmp_ni->initialized_size);
+	ntfs_debug("Found free record (#3), bit 0x%llx.", (long long)bit);
+found_free_rec:
+	/* @bit is the found free mft record, allocate it in the mft bitmap. */
+	ntfs_debug("At found_free_rec.");
+	err = ntfs_bitmap_set_bit(vol->mftbmp_ino, bit);
+	if (unlikely(err)) {
+		ntfs_error(vol->sb, "Failed to allocate bit in mft bitmap.");
+		up_write(&vol->mftbmp_lock);
+		goto err_out;
+	}
+	ntfs_debug("Set bit 0x%llx in mft bitmap.", (long long)bit);
+have_alloc_rec:
+	/*
+	 * The mft bitmap is now uptodate.  Deal with mft data attribute now.
+	 * Note, we keep hold of the mft bitmap lock for writing until all
+	 * modifications to the mft data attribute are complete, too, as they
+	 * will impact decisions for mft bitmap and mft record allocation done
+	 * by a parallel allocation and if the lock is not maintained a
+	 * parallel allocation could allocate the same mft record as this one.
+	 */
+	ll = (bit + 1) << vol->mft_record_size_bits;
+	if (ll <= mft_ni->initialized_size) {
+		ntfs_debug("Allocated mft record already initialized.");
+		goto mft_rec_already_initialized;
+	}
+	ntfs_debug("Initializing allocated mft record.");
+	/*
+	 * The mft record is outside the initialized data.  Extend the mft data
+	 * attribute until it covers the allocated record.  The loop is only
+	 * actually traversed more than once when a freshly formatted volume is
+	 * first written to so it optimizes away nicely in the common case.
+	 */
+	ntfs_debug("Status of mft data before extension: "
+			"allocated_size 0x%llx, data_size 0x%llx, "
+			"initialized_size 0x%llx.",
+			(long long)mft_ni->allocated_size,
+			(long long)vol->mft_ino->i_size,
+			(long long)mft_ni->initialized_size);
+	while (ll > mft_ni->allocated_size) {
+		err = ntfs_mft_data_extend_allocation_nolock(vol);
+		if (unlikely(err)) {
+			ntfs_error(vol->sb, "Failed to extend mft data "
+					"allocation.");
+			goto undo_mftbmp_alloc_nolock;
+		}
+		ntfs_debug("Status of mft data after allocation extension: "
+				"allocated_size 0x%llx, data_size 0x%llx, "
+				"initialized_size 0x%llx.",
+				(long long)mft_ni->allocated_size,
+				(long long)vol->mft_ino->i_size,
+				(long long)mft_ni->initialized_size);
+	}
+	/*
+	 * Extend mft data initialized size (and data size of course) to reach
+	 * the allocated mft record, formatting the mft records allong the way.
+	 * Note: We only modify the ntfs_inode structure as that is all that is
+	 * needed by ntfs_mft_record_format().  We will update the attribute
+	 * record itself in one fell swoop later on.
+	 */
+	old_data_initialized = mft_ni->initialized_size;
+	old_data_size = vol->mft_ino->i_size;
+	while (ll > mft_ni->initialized_size) {
+		s64 new_initialized_size, mft_no;
+		
+		new_initialized_size = mft_ni->initialized_size +
+				vol->mft_record_size;
+		mft_no = mft_ni->initialized_size >> vol->mft_record_size_bits;
+		if (new_initialized_size > vol->mft_ino->i_size)
+			vol->mft_ino->i_size = new_initialized_size;
+		ntfs_debug("Initializing mft record 0x%llx.",
+				(long long)mft_no);
+		err = ntfs_mft_record_format(vol, mft_no);
+		if (unlikely(err)) {
+			ntfs_error(vol->sb, "Failed to format mft record.");
+			goto undo_data_init;
+		}
+		mft_ni->initialized_size = new_initialized_size;
+	}
+	record_formatted = TRUE;
+	/* Update the mft data attribute record to reflect the new sizes. */
+	m = map_mft_record(mft_ni);
+	if (IS_ERR(m)) {
+		ntfs_error(vol->sb, "Failed to map mft record.");
+		err = PTR_ERR(m);
+		goto undo_data_init;
+	}
+	ctx = ntfs_attr_get_search_ctx(mft_ni, m);
+	if (unlikely(!ctx)) {
+		ntfs_error(vol->sb, "Failed to get search context.");
+		err = -ENOMEM;
+		unmap_mft_record(mft_ni);
+		goto undo_data_init;
+	}
+	err = ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
+			CASE_SENSITIVE, 0, NULL, 0, ctx);
+	if (unlikely(err)) {
+		ntfs_error(vol->sb, "Failed to find first attribute extent of "
+				"mft data attribute.");
+		ntfs_attr_put_search_ctx(ctx);
+		unmap_mft_record(mft_ni);
+		goto undo_data_init;
+	}
+	a = ctx->attr;
+	a->data.non_resident.initialized_size =
+			cpu_to_sle64(mft_ni->initialized_size);
+	a->data.non_resident.data_size = cpu_to_sle64(vol->mft_ino->i_size);
+	/* Ensure the changes make it to disk. */
+	flush_dcache_mft_record_page(ctx->ntfs_ino);
+	mark_mft_record_dirty(ctx->ntfs_ino);
+	ntfs_attr_put_search_ctx(ctx);
+	unmap_mft_record(mft_ni);
+	ntfs_debug("Status of mft data after mft record initialization: "
+			"allocated_size 0x%llx, data_size 0x%llx, "
+			"initialized_size 0x%llx.",
+			(long long)mft_ni->allocated_size,
+			(long long)vol->mft_ino->i_size,
+			(long long)mft_ni->initialized_size);
+	BUG_ON(vol->mft_ino->i_size > mft_ni->allocated_size);
+	BUG_ON(mft_ni->initialized_size > vol->mft_ino->i_size);
+mft_rec_already_initialized:
+	/*
+	 * We can finally drop the mft bitmap lock as the mft data attribute
+	 * has been fully updated.  The only disparity left is that the
+	 * allocated mft record still needs to be marked as in use to match the
+	 * set bit in the mft bitmap but this is actually not a problem since
+	 * this mft record is not referenced from anywhere yet and the fact
+	 * that it is allocated in the mft bitmap means that no-one will try to
+	 * allocate it either.
+	 */
+	up_write(&vol->mftbmp_lock);
+	/*
+	 * We now have allocated and initialized the mft record.  Calculate the
+	 * index of and the offset within the page cache page the record is in.
+	 */
+	index = bit << vol->mft_record_size_bits >> PAGE_CACHE_SHIFT;
+	ofs = (bit << vol->mft_record_size_bits) & ~PAGE_CACHE_MASK;
+	/* Read, map, and pin the page containing the mft record. */
+	page = ntfs_map_page(vol->mft_ino->i_mapping, index);
+	if (unlikely(IS_ERR(page))) {
+		ntfs_error(vol->sb, "Failed to map page containing allocated "
+				"mft record 0x%llx.", (long long)bit);
+		err = PTR_ERR(page);
+		goto undo_mftbmp_alloc;
+	}
+	lock_page(page);
+	BUG_ON(!PageUptodate(page));
+	ClearPageUptodate(page);
+	m = (MFT_RECORD*)((u8*)page_address(page) + ofs);
+	/* If we just formatted the mft record no need to do it again. */
+	if (!record_formatted) {
+		/* Sanity check that the mft record is really not in use. */
+		if (ntfs_is_file_record(m->magic) &&
+				(m->flags & MFT_RECORD_IN_USE)) {
+			ntfs_error(vol->sb, "Mft record 0x%llx was marked "
+					"free in mft bitmap but is marked "
+					"used itself.  Corrupt filesystem.  "
+					"Unmount and run chkdsk.",
+					(long long)bit);
+			err = -EIO;
+			SetPageUptodate(page);
+			unlock_page(page);
+			ntfs_unmap_page(page);
+			NVolSetErrors(vol);
+			goto undo_mftbmp_alloc;
+		}
+		/*
+		 * We need to (re-)format the mft record, preserving the
+		 * sequence number if it is not zero as well as the update
+		 * sequence number if it is not zero or -1 (0xffff).  This
+		 * means we do not need to care whether or not something went
+		 * wrong with the previous mft record.
+		 */
+		seq_no = m->sequence_number;
+		usn = *(le16*)((u8*)m + le16_to_cpu(m->usa_ofs));
+		err = ntfs_mft_record_layout(vol, bit, m);
+		if (unlikely(err)) {
+			ntfs_error(vol->sb, "Failed to layout allocated mft "
+					"record 0x%llx.", (long long)bit);
+			SetPageUptodate(page);
+			unlock_page(page);
+			ntfs_unmap_page(page);
+			goto undo_mftbmp_alloc;
+		}
+		if (seq_no)
+			m->sequence_number = seq_no;
+		if (usn && le16_to_cpu(usn) != 0xffff)
+			*(le16*)((u8*)m + le16_to_cpu(m->usa_ofs)) = usn;
+	}
+	/* Set the mft record itself in use. */
+	m->flags |= MFT_RECORD_IN_USE;
+	if (S_ISDIR(mode))
+		m->flags |= MFT_RECORD_IS_DIRECTORY;
+	flush_dcache_page(page);
+	SetPageUptodate(page);
+	if (base_ni) {
+		/*
+		 * Setup the base mft record in the extent mft record.  This
+		 * completes initialization of the allocated extent mft record
+		 * and we can simply use it with map_extent_mft_record().
+		 */
+		m->base_mft_record = MK_LE_MREF(base_ni->mft_no,
+				base_ni->seq_no);
+		/*
+		 * Allocate an extent inode structure for the new mft record,
+		 * attach it to the base inode @base_ni and map, pin, and lock
+		 * its, i.e. the allocated, mft record.
+		 */
+		m = map_extent_mft_record(base_ni, bit, &ni);
+		if (IS_ERR(m)) {
+			ntfs_error(vol->sb, "Failed to map allocated extent "
+					"mft record 0x%llx.", (long long)bit);
+			err = PTR_ERR(m);
+			/* Set the mft record itself not in use. */
+			m->flags &= cpu_to_le16(
+					~le16_to_cpu(MFT_RECORD_IN_USE));
+			flush_dcache_page(page);
+			/* Make sure the mft record is written out to disk. */
+			mark_ntfs_record_dirty(page, ofs);
+			unlock_page(page);
+			ntfs_unmap_page(page);
+			goto undo_mftbmp_alloc;
+		}
+		/*
+		 * Make sure the allocated mft record is written out to disk.
+		 * No need to set the inode dirty because the caller is going
+		 * to do that anyway after finishing with the new extent mft
+		 * record (e.g. at a minimum a new attribute will be added to
+		 * the mft record.
+		 */
+		mark_ntfs_record_dirty(page, ofs);
+		unlock_page(page);
+		/*
+		 * Need to unmap the page since map_extent_mft_record() mapped
+		 * it as well so we have it mapped twice at the moment.
+		 */
+		ntfs_unmap_page(page);
+	} else {
+		/*
+		 * Allocate a new VFS inode and set it up.  NOTE: @vi->i_nlink
+		 * is set to 1 but the mft record->link_count is 0.  The caller
+		 * needs to bear this in mind.
+		 */
+		vi = new_inode(vol->sb);
+		if (unlikely(!vi)) {
+			err = -ENOMEM;
+			/* Set the mft record itself not in use. */
+			m->flags &= cpu_to_le16(
+					~le16_to_cpu(MFT_RECORD_IN_USE));
+			flush_dcache_page(page);
+			/* Make sure the mft record is written out to disk. */
+			mark_ntfs_record_dirty(page, ofs);
+			unlock_page(page);
+			ntfs_unmap_page(page);
+			goto undo_mftbmp_alloc;
+		}
+		vi->i_ino = bit;
+		/*
+		 * This is the optimal IO size (for stat), not the fs block
+		 * size.
+		 */
+		vi->i_blksize = PAGE_CACHE_SIZE;
+		/*
+		 * This is for checking whether an inode has changed w.r.t. a
+		 * file so that the file can be updated if necessary (compare
+		 * with f_version).
+		 */
+		vi->i_version = 1;
+
+		/* The owner and group come from the ntfs volume. */
+		vi->i_uid = vol->uid;
+		vi->i_gid = vol->gid;
+
+		/* Initialize the ntfs specific part of @vi. */
+		ntfs_init_big_inode(vi);
+		ni = NTFS_I(vi);
+		/*
+		 * Set the appropriate mode, attribute type, and name.  For
+		 * directories, also setup the index values to the defaults.
+		 */
+		if (S_ISDIR(mode)) {
+			vi->i_mode = S_IFDIR | S_IRWXUGO;
+			vi->i_mode &= ~vol->dmask;
+
+			NInoSetMstProtected(ni);
+			ni->type = AT_INDEX_ALLOCATION;
+			ni->name = I30;
+			ni->name_len = 4;
+
+			ni->itype.index.block_size = 4096;
+			ni->itype.index.block_size_bits = generic_ffs(4096) - 1;
+			ni->itype.index.collation_rule = COLLATION_FILE_NAME;
+			if (vol->cluster_size <= ni->itype.index.block_size) {
+				ni->itype.index.vcn_size = vol->cluster_size;
+				ni->itype.index.vcn_size_bits =
+						vol->cluster_size_bits;
+			} else {
+				ni->itype.index.vcn_size = vol->sector_size;
+				ni->itype.index.vcn_size_bits =
+						vol->sector_size_bits;
+			}
+		} else {
+			vi->i_mode = S_IFREG | S_IRWXUGO;
+			vi->i_mode &= ~vol->fmask;
+
+			ni->type = AT_DATA;
+			ni->name = NULL;
+			ni->name_len = 0;
+		}
+		if (IS_RDONLY(vi))
+			vi->i_mode &= ~S_IWUGO;
+
+		/* Set the inode times to the current time. */
+		vi->i_atime = vi->i_mtime = vi->i_ctime = current_kernel_time();
+		/*
+		 * Set the file size to 0, the ntfs inode sizes are set to 0 by
+		 * the call to ntfs_init_big_inode() below.
+		 */
+		vi->i_size = 0;
+		vi->i_blocks = 0;
+
+		/* Set the sequence number. */
+		vi->i_generation = ni->seq_no = le16_to_cpu(m->sequence_number);
+		/*
+		 * Manually map, pin, and lock the mft record as we already
+		 * have its page mapped and it is very easy to do.
+		 */
+		atomic_inc(&ni->count);
+		down(&ni->mrec_lock);
+		ni->page = page;
+		ni->page_ofs = ofs;
+		/*
+		 * Make sure the allocated mft record is written out to disk.
+		 * NOTE: We do not set the ntfs inode dirty because this would
+		 * fail in ntfs_write_inode() because the inode does not have a
+		 * standard information attribute yet.  Also, there is no need
+		 * to set the inode dirty because the caller is going to do
+		 * that anyway after finishing with the new mft record (e.g. at
+		 * a minimum some new attributes will be added to the mft
+		 * record.
+		 */
+		mark_ntfs_record_dirty(page, ofs);
+		unlock_page(page);
+
+		/* Add the inode to the inode hash for the superblock. */
+		insert_inode_hash(vi);
+
+		/* Update the default mft allocation position. */
+		vol->mft_data_pos = bit + 1;
+	}
+	/*
+	 * Return the opened, allocated inode of the allocated mft record as
+	 * well as the mapped, pinned, and locked mft record.
+	 */
+	ntfs_debug("Returning opened, allocated %sinode 0x%llx.",
+			base_ni ? "extent " : "", (long long)bit);
+	*mrec = m;
+	return ni;
+undo_data_init:
+	mft_ni->initialized_size = old_data_initialized;
+	vol->mft_ino->i_size = old_data_size;
+	goto undo_mftbmp_alloc_nolock;
+undo_mftbmp_alloc:
+	down_write(&vol->mftbmp_lock);
+undo_mftbmp_alloc_nolock:
+	if (ntfs_bitmap_clear_bit(vol->mftbmp_ino, bit)) {
+		ntfs_error(vol->sb, "Failed to clear bit in mft bitmap.%s", es);
+		NVolSetErrors(vol);
+	}
+	up_write(&vol->mftbmp_lock);
+err_out:
+	return ERR_PTR(err);
+max_err_out:
+	ntfs_warning(vol->sb, "Cannot allocate mft record because the maximum "
+			"number of inodes (2^32) has already been reached.");
+	up_write(&vol->mftbmp_lock);
+	return ERR_PTR(-ENOSPC);
+}
+
+/**
  * ntfs_extent_mft_record_free - free an extent mft record on an ntfs volume
  * @ni:		ntfs inode of the mapped extent mft record to free
  * @m:		mapped extent mft record of the ntfs inode @ni
diff -Nru a/fs/ntfs/mft.h b/fs/ntfs/mft.h
--- a/fs/ntfs/mft.h	2004-10-19 10:15:13 +01:00
+++ b/fs/ntfs/mft.h	2004-10-19 10:15:13 +01:00
@@ -118,6 +118,8 @@
 		const unsigned long mft_no, const MFT_RECORD *m,
 		ntfs_inode **locked_ni);
 
+extern ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol, const int mode,
+		ntfs_inode *base_ni, MFT_RECORD **mrec);
 extern int ntfs_extent_mft_record_free(ntfs_inode *ni, MFT_RECORD *m);
 
 #endif /* NTFS_RW */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 36/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:47                                                                     ` [PATCH 35/37] " Anton Altaparmakov
@ 2004-10-19  9:47                                                                       ` Anton Altaparmakov
  2004-10-19  9:48                                                                         ` [PATCH 37/37] " Anton Altaparmakov
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 36/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/17 1.2059)
   NTFS: 2.1.21 release
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/Documentation/filesystems/ntfs.txt b/Documentation/filesystems/ntfs.txt
--- a/Documentation/filesystems/ntfs.txt	2004-10-19 10:15:17 +01:00
+++ b/Documentation/filesystems/ntfs.txt	2004-10-19 10:15:17 +01:00
@@ -277,6 +277,10 @@
 
 Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog.
 
+2.1.21:
+	- Fix several race conditions and various other bugs.
+	- Many internal cleanups, code reorganization, optimizations, and mft
+	  and index record writing code rewritten to fit in with the changes.
 2.1.20:
 	- Fix two stupid bugs introduced in 2.1.18 release.
 2.1.19:
diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:15:17 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:15:17 +01:00
@@ -21,7 +21,7 @@
 	- Enable the code for setting the NT4 compatibility flag when we start
 	  making NTFS 1.2 specific modifications.
 
-2.1.21-WIP
+2.1.21 - Fix some races and bugs, rewrite mft write code, add mft allocator.
 
 	- Implement extent mft record deallocation
 	  fs/ntfs/mft.c::ntfs_extent_mft_record_free().
@@ -115,7 +115,7 @@
 	  fs/ntfs/inode.c::ntfs_clear_big_inode().  (Thanks to Christoph
 	  Hellwig for spotting this.)
 	- Fix race condition in fs/ntfs/inode.c::ntfs_put_inode() by taking the
-	  inode semaphore around the code thst sets ni->itype.index.bmp_ino to
+	  inode semaphore around the code that sets ni->itype.index.bmp_ino to
 	  NULL and reorganize the code to optimize it a bit.  (Thanks to
 	  Christoph Hellwig for spotting this.)
 	- Modify fs/ntfs/aops.c::mark_ntfs_record_dirty() to no longer take the
diff -Nru a/fs/ntfs/Makefile b/fs/ntfs/Makefile
--- a/fs/ntfs/Makefile	2004-10-19 10:15:17 +01:00
+++ b/fs/ntfs/Makefile	2004-10-19 10:15:17 +01:00
@@ -6,7 +6,7 @@
 	     index.o inode.o mft.o mst.o namei.o runlist.o super.o sysctl.o \
 	     unistr.o upcase.o
 
-EXTRA_CFLAGS = -DNTFS_VERSION=\"2.1.21-WIP\"
+EXTRA_CFLAGS = -DNTFS_VERSION=\"2.1.21\"
 
 ifeq ($(CONFIG_NTFS_DEBUG),y)
 EXTRA_CFLAGS += -DDEBUG

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 37/37] Re: [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes
  2004-10-19  9:47                                                                       ` [PATCH 36/37] " Anton Altaparmakov
@ 2004-10-19  9:48                                                                         ` Anton Altaparmakov
  0 siblings, 0 replies; 38+ messages in thread
From: Anton Altaparmakov @ 2004-10-19  9:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, linux-ntfs-dev

This is patch 37/37 in the series.  It contains the following ChangeSet:

<aia21@cantab.net> (04/10/18 1.2061)
   NTFS: Update Documentation/filesystems/ntfs.txt with instructions on how to
         use the Device-Mapper driver with NTFS ftdisk/LDM raid.  This removes
         the linear raid problem with the Software RAID / MD driver when one
         or more of the devices has an odd number of sectors.
   
   Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

Best regards,

	Anton, who is starting to really need a patchbombing script...
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

===================================================================

diff -Nru a/Documentation/filesystems/ntfs.txt b/Documentation/filesystems/ntfs.txt
--- a/Documentation/filesystems/ntfs.txt	2004-10-19 10:15:21 +01:00
+++ b/Documentation/filesystems/ntfs.txt	2004-10-19 10:15:21 +01:00
@@ -10,8 +10,10 @@
 - Features
 - Supported mount options
 - Known bugs and (mis-)features
-- Using Software RAID with NTFS
-- Limitiations when using the MD driver
+- Using NTFS volume and stripe sets
+  - The Device-Mapper driver
+  - The Software RAID / MD driver
+  - Limitiations when using the MD driver
 - ChangeLog
 
 
@@ -199,11 +201,161 @@
 list at sourceforge: linux-ntfs-dev@lists.sourceforge.net
 
 
-Using Software RAID with NTFS
-=============================
+Using NTFS volume and stripe sets
+=================================
 
-For support of volume and stripe sets, use the kernel's Software RAID / MD
-driver and set up your /etc/raidtab appropriately (see man 5 raidtab).
+For support of volume and stripe sets, you can either use the kernel's
+Device-Mapper driver or the kernel's Software RAID / MD driver.  The former is
+the recommended one to use for linear raid.  But the latter is required for
+raid level 5.  For striping and mirroring, either driver should work fine.
+
+
+The Device-Mapper driver
+------------------------
+
+You will need to create a table of the components of the volume/stripe set and
+how they fit together and load this into the kernel using the dmsetup utility
+(see man 8 dmsetup).
+
+Linear volume sets, i.e. linear raid, has been tested and works fine.  Even
+though untested, there is no reason why stripe sets, i.e. raid level 0, and
+mirrors, i.e. raid level 1 should not work, too.  Stripes with parity, i.e.
+raid level 5, unfortunately cannot work yet because the current version of the
+Device-Mapper driver does not support raid level 5.  You may be able to use the
+Software RAID / MD driver for raid level 5, see the next section for details.
+
+To create the table describing your volume you will need to know each of its
+components and their sizes in sectors, i.e. multiples of 512-byte blocks.
+
+For NT4 fault tolerant volumes you can obtain the sizes using fdisk.  So for
+example if one of your partitions is /dev/hda2 you would do:
+
+$ fdisk -ul /dev/hda
+
+Disk /dev/hda: 81.9 GB, 81964302336 bytes
+255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors
+Units = sectors of 1 * 512 = 512 bytes
+
+   Device Boot      Start         End      Blocks   Id  System
+   /dev/hda1   *          63     4209029     2104483+  83  Linux
+   /dev/hda2         4209030    37768814    16779892+  86  NTFS
+   /dev/hda3        37768815    46170809     4200997+  83  Linux
+
+And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =
+33559785 sectors.
+
+For Win2k and later dynamic disks, you can for example use the ldminfo utility
+which is part of the Linux LDM tools (the latest version at the time of
+writing is linux-ldm-0.0.8.tar.bz2).  You can download it from:
+	http://linux-ntfs.sourceforge.net/downloads.html
+Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go
+into it (cd linux-ldm-0.0.8) and change to the test directory (cd test).  You
+will find the precompiled (i386) ldminfo utility there.  NOTE: You will not be
+able to compile this yourself easily so use the binary version!
+
+Then you would use ldminfo in dump mode to obtain the necessary information:
+
+$ ./ldminfo --dump /dev/hda
+
+This would dump the LDM database found on /dev/hda which describes all of your
+dinamic disks and all the volumes on them.  At the bottom you will see the
+VOLUME DEFINITIONS section which is all you really need.  You may need to look
+further above to determine which of the disks in the volume definitions is
+which device in Linux.  Hint: Run ldminfo on each of your dinamic disks and
+look at the Disk Id close to the top of the output for each (the PRIVATE HEADER
+section).  You can then find these Disk Ids in the VBLK DATABASE section in the
+<Disk> components where you will get the LDM Name for the disk that is found in
+the VOLUME DEFINITIONS section.
+
+Note you will also need to enable the LDM driver in the Linux kernel.  If your
+distribution did not enable it, you will need to recompile the kernel with it
+enabled.  This will create the LDM partitions on each device at boot time.  You
+would then use those devices (for /dev/hda they would be /dev/hda1, 2, 3, etc)
+in the Device-Mapper table.
+
+You can also bypass using the LDM driver by using the main device (e.g.
+/dev/hda) and then using the offsets of the LDM partitions into this device as
+the "Start sector of device" when creating the table.  Once again ldminfo would
+give you the correct information to do this.
+
+Assuming you know all your devices and their sizes things are easy.
+
+For a linear raid the table would look like this (note all values are in
+512-byte sectors):
+
+--- cut here ---
+# Offset into	Size of this	Raid type	Device		Start sector
+# volume	device						of device
+0		1028161		linear		/dev/hda1	0
+1028161		3903762		linear		/dev/hdb2	0
+4931923		2103211		linear		/dev/hdc1	0
+--- cut here ---
+
+For a striped volume, i.e. raid level 0, you will need to know the chunk size
+you used when creating the volume.  Windows uses 64kiB as the default, so it
+will probably be this unless you changes the defaults when creating the array.
+
+For a raid level 0 the table would look like this (note all values are in
+512-byte sectors):
+
+--- cut here ---
+# Offset   Size	    Raid     Number   Chunk  1st        Start	2nd	  Start
+# into     of the   type     of	      size   Device	in	Device	  in
+# volume   volume	     stripes			device		  device
+0	   2056320  striped  2	      128    /dev/hda1	0	/dev/hdb1 0
+--- cut here ---
+
+If there are more than two devices, just add each of them to the end of the
+line.
+
+Finally, for a mirrored volume, i.e. raid level 1, the table would look like
+this (note all values are in 512-byte sectors):
+
+--- cut here ---
+# Ofs Size   Raid   Log  Number Region Should Number Source  Start Taget  Start
+# in  of the type   type of log size   sync?  of     Device  in    Device in
+# vol volume		 params		     mirrors	     Device	  Device
+0    2056320 mirror core 2	16     nosync 2	   /dev/hda1 0   /dev/hdb1 0
+--- cut here ---
+
+If you are mirroring to multiple devices you can specify further targets at the
+end of the line.
+
+Note the "Should sync?" parameter "nosync" means that the two mirrors are
+already in sync which will be the case on a clean shutdown of Windows.  If the
+mirrors are not clean, you can specify the "sync" option instead of "nosync"
+and the Device-Mapper driver will then copy the entirey of the "Source Device"
+to the "Target Device" or if you specified multipled target devices to all of
+them.
+
+Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1),
+and hand it over to dmsetup to work with, like so:
+
+$ dmsetup create myvolume1 /etc/ntfsvolume1
+
+You can obviously replace "myvolume1" with whatever name you like.
+
+If it all worked, you will now have the device /dev/device-mapper/myvolume1
+which you can then just use as an argument to the mount command as usual to
+mount the ntfs volume.  For example:
+
+$ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1
+
+(You need to create the directory /mnt/myvol1 first and of course you can use
+anything you like instead of /mnt/myvol1 as long as it is an existing
+directory.)
+
+It is advisable to do the mount read-only to see if the volume has been setup
+correctly to avoid the possibility of causing damage to the data on the ntfs
+volume.
+
+
+The Software RAID / MD driver
+-----------------------------
+
+An alternative to using the Device-Mapper driver is to use the kernel's
+Software RAID / MD driver.  For which you need to set up your /etc/raidtab
+appropriately (see man 5 raidtab).
 
 Linear volume sets, i.e. linear raid, as well as stripe sets, i.e. raid level
 0, have been tested and work fine (though see section "Limitiations when using
@@ -258,8 +410,8 @@
 ntfs volume.
 
 
-Limitiations when using the MD driver
-=====================================
+Limitiations when using the Software RAID / MD driver
+-----------------------------------------------------
 
 Using the md driver will not work properly if any of your NTFS partitions have
 an odd number of sectors.  This is especially important for linear raid as all
@@ -271,6 +423,9 @@
 So when using linear raid, make sure that all your partitions have an even
 number of sectors BEFORE attempting to use it.  You have been warned!
 
+Even better is to simply use the Device-Mapper for linear raid and then you do
+not have this problem with odd numbers of sectors.
+
 
 ChangeLog
 =========
@@ -281,6 +436,8 @@
 	- Fix several race conditions and various other bugs.
 	- Many internal cleanups, code reorganization, optimizations, and mft
 	  and index record writing code rewritten to fit in with the changes.
+	- Update Documentation/filesystems/ntfs.txt with instructions on how to
+	  use the Device-Mapper driver with NTFS ftdisk/LDM raid.
 2.1.20:
 	- Fix two stupid bugs introduced in 2.1.18 release.
 2.1.19:
diff -Nru a/fs/ntfs/ChangeLog b/fs/ntfs/ChangeLog
--- a/fs/ntfs/ChangeLog	2004-10-19 10:15:21 +01:00
+++ b/fs/ntfs/ChangeLog	2004-10-19 10:15:21 +01:00
@@ -138,6 +138,10 @@
 	  record sequence number if it is specified (i.e. not zero).
 	- Add fs/ntfs/mft.[hc]::ntfs_mft_record_alloc() and various helper
 	  functions used by it.
+	- Update Documentation/filesystems/ntfs.txt with instructions on how to
+	  use the Device-Mapper driver with NTFS ftdisk/LDM raid.  This removes
+	  the linear raid problem with the Software RAID / MD driver when one
+	  or more of the devices has an odd number of sectors.
 
 2.1.20 - Fix two stupid bugs introduced in 2.1.18 release.
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2004-10-19 11:25 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-19  9:37 [2.6-BK-URL] NTFS: 2.1.21 - Big update with race/bug fixes Anton Altaparmakov
2004-10-19  9:38 ` [PATCH 1/37] " Anton Altaparmakov
2004-10-19  9:39   ` [PATCH 2/37] " Anton Altaparmakov
2004-10-19  9:39     ` [PATCH 3/37] " Anton Altaparmakov
2004-10-19  9:39       ` [PATCH 4/37] " Anton Altaparmakov
2004-10-19  9:40         ` [PATCH 5/37] " Anton Altaparmakov
2004-10-19  9:40           ` [PATCH 6/37] " Anton Altaparmakov
2004-10-19  9:40             ` [PATCH 7/37] " Anton Altaparmakov
2004-10-19  9:40               ` [PATCH 8/37] " Anton Altaparmakov
2004-10-19  9:41                 ` [PATCH 9/37] " Anton Altaparmakov
2004-10-19  9:41                   ` [PATCH 10/37] " Anton Altaparmakov
2004-10-19  9:41                     ` [PATCH 11/37] " Anton Altaparmakov
2004-10-19  9:41                       ` [PATCH 12/37] " Anton Altaparmakov
2004-10-19  9:41                         ` [PATCH 13/37] " Anton Altaparmakov
2004-10-19  9:42                           ` [PATCH 14/37] " Anton Altaparmakov
2004-10-19  9:42                             ` [PATCH 15/37] " Anton Altaparmakov
2004-10-19  9:42                               ` [PATCH 16/37] " Anton Altaparmakov
2004-10-19  9:42                                 ` [PATCH 17/37] " Anton Altaparmakov
2004-10-19  9:43                                   ` [PATCH 18/37] " Anton Altaparmakov
2004-10-19  9:43                                     ` [PATCH 19/37] " Anton Altaparmakov
2004-10-19  9:43                                       ` [PATCH 20/37] " Anton Altaparmakov
2004-10-19  9:43                                         ` [PATCH 21/37] " Anton Altaparmakov
2004-10-19  9:44                                           ` [PATCH 22/37] " Anton Altaparmakov
2004-10-19  9:44                                             ` [PATCH 23/37] " Anton Altaparmakov
2004-10-19  9:44                                               ` [PATCH 24/37] " Anton Altaparmakov
2004-10-19  9:44                                                 ` [PATCH 25/37] " Anton Altaparmakov
2004-10-19  9:45                                                   ` [PATCH 26/37] " Anton Altaparmakov
2004-10-19  9:45                                                     ` [PATCH 27/37] " Anton Altaparmakov
2004-10-19  9:45                                                       ` [PATCH 28/37] " Anton Altaparmakov
2004-10-19  9:45                                                         ` [PATCH 29/37] " Anton Altaparmakov
2004-10-19  9:46                                                           ` [PATCH 30/37] " Anton Altaparmakov
2004-10-19  9:46                                                             ` [PATCH 31/37] " Anton Altaparmakov
2004-10-19  9:46                                                               ` [PATCH 32/37] " Anton Altaparmakov
2004-10-19  9:46                                                                 ` [PATCH 33/37] " Anton Altaparmakov
2004-10-19  9:47                                                                   ` [PATCH 34/37] " Anton Altaparmakov
2004-10-19  9:47                                                                     ` [PATCH 35/37] " Anton Altaparmakov
2004-10-19  9:47                                                                       ` [PATCH 36/37] " Anton Altaparmakov
2004-10-19  9:48                                                                         ` [PATCH 37/37] " Anton Altaparmakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).