	Freedups searches through the directories you specify.  When it
finds two identical files, it hard links them together.  Now the two or
more files still exist in their respective directories, but only one copy
of the data is stored on disk; both directory entries point to the same
data blocks.
	This allows you to reclaim space on your drive.  It's that 
simple.  Run it every night from a cron job.


Why you'd want to use it:
	- You have multiple copies of a source code tree on your system.
Freedups will link any identical files together and ignore any files
that changed between versions.
	- You have multiple copies of the file COPYING in /usr/doc or
/usr/share/DOC
	- Depending on your system, the following might be good places
to try linking (size in parentheses is amount saved on a very basic
RedHat 7.3 install; you'll probably get even more savings):
freedups /lib/kbd					(463K)
freedups /usr/doc /usr/share/doc
freedups /usr/src/linux*
freedups /usr/src/pcmcia-cs*
freedups /usr/share					(8.6M)
freedups /usr/lib					(97K)
freedups /usr/man /usr/share/man
freedups /usr/share/locale /etc/locale			(652K)
freedups /usr/share/scrollkeeper /var/lib/scrollkeeper	(719K)
	- Directories holding files that are only read are good
candidates.
	You might also find some space savings by deleting the
/usr/share/locale/country_code/LC_MESSAGES/*.mo files in country_codes
you don't need.


Things to watch out for:
	- You'll need to use the _full path_, starting with /, to any
files or directories you want freedups to search.  If you don't, you'll
likely get an error like "cannot stat file".
	- Remember that you now have multiple directory entries pointing
at one block of data.  Depending on what editor you use, when you change
one of the files you may be changing the others as well.  See below for 
a list of applications and whether they automatically handle hardlinks
or not.
	- For the above reason, you probably don't want to create links
to any backup copies on the drive.
	- If the files are on different partitions, it's not possible to
create a hardlink between them.  Freedups handles this gracefully.
	- Directories holding files that might be written to are
generally not good candidates.  Similarly, avoid directories holding
security-related files.  /etc is a bad choice on both counts.
	- If you run freedups without the --datesequal=yes option,
freedups may link files with different modification times together.  If
you later use "rpm -Va" (or the equivalent debian system verify
command), it may report that the timestamps on some files have changed.
If this is _all_ that has changed, this is a cosmetic problem only.  For
example, the following is cosmetic and not indicative of a modified
file:

.......T   /usr/share/automake/COPYING


Can I run this and just see what would have been linked together without
modifying anything?
	Sure.  In fact, unless you put -a on the command line, that's
_all_ freedups will do.  By default, it won't actually do anything,
it'll just tell you what the approximate space savings would be.


Does this really save any space?
	It really depends on whether you have duplicate files on your
filesystem or not.  I've personally recovered ~3G on my main drive from
hardlinking identical files in the various kernel trees I have there.
One user reports saving ~2G simply from hardlinking identical files
downloaded by a p2p file sharing program.


Does this slow down the system like the drive compression programs?
	No.  No files are compressed with this tool.  It only instructs
the filesystem to keep one copy of two or more identical files and have
all their directory entries point at the sole copy of the actual file
data.  In fact, for certain operations (such as using diff between two
freedup'd directory trees), the system runs much, much faster.
	File reads should _not_ become slower.
	Running freedups can take quite a while, but it can certainly
be run off-hours or when the system is generally idle.  It can be run
under nice to give other programs priority.


Do I have to run this as root?
	Not at all.  As long as you own the files, freedups runs just
fine as a normal user.


What has to be true for two files to get linked together?
	- They have to be files (i.e. not character or block devices, no
pipes, no directories, no symlinks).
	- They have to have at least one byte.  I don't want to link
all 0 byte files on the system together.
	- They have to have the same size.
	- They have to have the same user owner, group owner and mode.
Skirting this requirement would raise _serious_ security considerations.
If you want to link two files that currently differ in owner or mode, 
use chown or chmod to make their owners or modes identical and re-run
freedups.
	- They have to be readable by the current user.
	- The contents of the files have to be identical.
	- Optionally (--minsize=1000), the files have to be larger than
the given number of bytes.
	- Optionally (--datesequal=yes), the files have to have identical
modification timestamps.
	- Optionally (--filenamesequal=yes), the filenames have to be 
identical (in different directories, obviously).
	- They have to be on the same partition.

	- That partition must support hardlinks.  Ext2, ext3 and
reiserfs do.  I'm pretty sure fat/vfat/msdos do not.  If you know whether
another linux filesystem supports hardlinks or not, please let me know.


I think I have a bunch of files that should be linked together, but
freedups doesn't link them.  Why not?
	Walk through the above list of criteria for a given pair of
files in question.  Which one fails?
	To examine a pair of files, look at the output from:

ls -ali firstfile secondfile<Enter>

	which looks like:

2097229 -rw-rw-r--    1 wstearns wstearns        4 Mar 11 16:09 firstfile
2097673 -rw-------    1 nobody   nobody          5 Mar 11 16:10 secondfile

	The columns are: inode number, file mode, number of links to
this inode, user owner, group owner, file size, modification date,
modification time, and filename.  The above two files wouldn't be linked
because their modes are different, they're owned by different users,
they're owned by different groups, and have different sizes (so must
have different contents).  Depending on options, they may also be
disqualified because their modification times and filenames are
different.
	That said, if you do come up with files that legitimately should
be linked but aren't, please email me so I can fix freedups.


Can this be safely run more than once?
	Definitely.  Freedups is smart enough to recognize that two
files are already linked together and just moves on to the next pair.
	For this reason, running it twice on the exact same set of
files won't save any more space.

Are there different ways to do this?
	Sure.
	- Rewrite this in a more efficient language.
	- When copying a directory tree, hard link the files during the
copy:

cp -av --link linux-2.1.anything.orig linux-2.1.anything

	Many thanks to the Kernel FAQ and Janos Farkas for that trick.

	- Delete truly unneeded files
	- Use CVS or Bitkeeper; the latter, at least, can save
substantial amounts of space.


How can I test that the program is working?
	Try the following:
[wstearns@sparrow wstearns]$ cd /tmp
[wstearns@sparrow /tmp]$ mkdir duptest
[wstearns@sparrow /tmp]$ cd duptest
[wstearns@sparrow duptest]$ echo Hi there. >test1
[wstearns@sparrow duptest]$ cp -p test1 test2
[wstearns@sparrow duptest]$ ls -ali test1 test2
1885113 -rw-rw-r--    1 wstearns wstearns       10 Feb 28 00:55 test1
1885114 -rw-rw-r--    1 wstearns wstearns       10 Feb 28 00:55 test2

	Note the different inode numbers - the total space used by these
two files is 20 bytes (actually 2 filesytem blocks, but that's a detail).

[wstearns@sparrow duptest]$ freedups ./test1 ./test2
Options chosen: None 
About to check for links in " ./test1 ./test2"
10: Would have linked ./test2 and ./test1
Total space would have saved: 10 (An overestimate if more than two files would have been linked together.)

	By default, it just reports what the savings would have been.

[wstearns@sparrow duptest]$ freedups -a ./test1 ./test2
Options chosen: ActuallyLink 
About to check for links in " ./test1 ./test2"
10 Linked ./test2 and ./test1
Total space saved: 10 (Small risk of overcounting space saved if linked files have different times.)
[wstearns@sparrow duptest]$ ls -ali test1 test2
1885114 -rw-rw-r--    2 wstearns wstearns       10 Feb 28 00:55 test1
1885114 -rw-rw-r--    2 wstearns wstearns       10 Feb 28 00:55 test2

	Now both files share a single inode, so all but one copy is freed
and the free space rises accordingly.
	For more examples, run freedups with the "-h" help option.


Application list
	This list of applications shows whether they handle unlinking a
file before saving to it.  I made an attempt on each to find an option
that allows one to change this behavior, but may not have found one.
	Contributions and corrections are gratefully accepted.  Here's
how to test:

[wstearns@sparrow wstearns]$ cd /tmp
[wstearns@sparrow /tmp]$ mkdir linktest
[wstearns@sparrow /tmp]$ cd linktest
[wstearns@sparrow linktest]$ echo Hi there >test1
[wstearns@sparrow linktest]$ ln -f test1 test2
[wstearns@sparrow linktest]$ ls -ali test*
1885112 -rw-rw-r--    2 wstearns wstearns        9 Mar  5 12:52 test1
1885112 -rw-rw-r--    2 wstearns wstearns        9 Mar  5 12:52 test2
[wstearns@sparrow linktest]$ myprogram test1

#Replace myprogram with the program under test.
#In this program, add some characters to the file and save your changes.

[wstearns@sparrow linktest]$ ls -ali test*
1885112 -rw-rw-r--    2 wstearns wstearns       19 Mar  5 12:54 test1
1885112 -rw-rw-r--    2 wstearns wstearns       19 Mar  5 12:54 test2

	The fact that the two files still share an inode and both
changed in content means that the link between test1 and test2 was
preserved.  If, instead, you get:

[wstearns@sparrow linktest]$ ls -ali test*
2236994 -rw-rw-r--    2 wstearns wstearns       19 Mar  5 12:54 test1
1885112 -rw-rw-r--    2 wstearns wstearns        9 Mar  5 12:52 test2

	, this means the program unlinked test1 before saving the
changes.
	Note that neither behavior is "correct"; it's just that you
may prefer one over the other while working on a given file.

Editor			Action on save	Notes
abiword-0.7.11		preserves link
bash-1.14.7's ">"	preserves link
bash-1.14.7's ">>"	preserves link
emacs-20.7		preserves link
gedit-0.9.2		preserves link
gnotepad+-1.3.1		preserves link	#When "write backup file" turned off
gnotepad+-1.3.1		unlinks		#When "write backup file" turned on
gnumeric-0.58		preserves link
gxedit-1.23		preserves link
jove-4.16.0.24		preserves link
kedit-1.1.2		preserves link	#When "Backup Copies" turned off
kedit-1.1.2		unlinks		#When "Backup Copies" turned on
lyx-0.12.0		preserves link
mcedit-4.5.51		preserves link	#~/.mc/ini: editor_option_save_mode=0 (Save mode=quick save)
mcedit-4.5.51		unlinks		#~/.mc/ini: editor_option_save_mode=1 (Save mode=safe save)
netscape-4.76		unlinks		#Editor in netscape-communicator
nedit-5.1.1		preserves link
patch-2.5.4		unlinks
rpm-4.0			unlinks		#on "-U" upgrade, at least.
rsync-2.3.2		unlinks		#on server, hardlink is unlinked when a new version sent
vim-5.1			preserves link
wordperfect-7.0		preserves link	#"Original document backup" has no effect; always preserves link.
xedit-3.3.2		preserves link


Contacts and credits.
	Please send comments, suggestions, bug reports, patches, and/or
additions to the filesystem or applications list to William Stearns
<wstearns@pobox.com> .
	Many thanks to Kevin Burton for his constructive suggestions, 
most of which made it into v0.3.0.  Sorry, Kevin, it's still written in
bash.  :-)


