Suppose we have a directory full of files which is maintained with a set of automated tools; perhaps one set of tools updates them and another set of tools uses the result. In this situation, it might be useful for the second set of tools to know if the files have recently been changed. It might be useful, for example, to have a ’timestamp’ file which gives the timestamp on the newest file in the collection.
We can use find
to achieve this, but there are several
different ways to do it.
-printf
and sort
to compare timestampsmake
The obvious but wrong answer is just to use ‘-newer’:
find subdir -newer timestamp -exec touch -r {} timestamp \;
This does the right sort of thing but has a bug. Suppose that two files in the subdirectory have been updated, and that these are called file1 and file2. The command above will update timestamp with the modification time of file1 or that of file2, but we don’t know which one. Since the timestamps on file1 and file2 will in general be different, this could well be the wrong value.
One solution to this problem is to modify find
to recheck the
modification time of timestamp every time a file is to be
compared against it, but that will reduce the performance of
find
.
The test
command can be used to compare timestamps:
find subdir -exec test {} -nt timestamp \; -exec touch -r {} timestamp \;
This will ensure that any changes made to the modification time of
timestamp that take place during the execution of find
are taken into account. This resolves our earlier problem, but
unfortunately this runs much more slowly.
We can of course still use ‘-newer’ to cut down on the number of
calls to test
:
find subdir -newer timestamp -and \ -exec test {} -nt timestamp \; -and \ -exec touch -r {} timestamp \;
Here, the ‘-newer’ test excludes all the files which are definitely older than the timestamp, but all the files which are newer than the old value of the timestamp are compared against the current updated timestamp.
This is indeed faster in general, but the speed difference will depend on how many updated files there are.
-printf
and sort
to compare timestampsIt is possible to use the ‘-printf’ action to abandon the use of
test
entirely:
newest=$(find subdir -newer timestamp -printf "%A@:%p\n" | sort -n | tail -n1 | cut -d: -f2- ) touch -r "${newest:-timestamp}" timestamp
The command above works by generating a list of the timestamps and
names of all the files which are newer than the timestamp. The
sort
, tail
and cut
commands simply pull out the
name of the file with the largest timestamp value (that is, the latest
file). The touch
command is then used to update the timestamp,
The "${newest:-timestamp}"
expression simply expands to the
value of $newest
if that variable is set, but to
timestamp otherwise. This ensures that an argument is always
given to the ‘-r’ option of the touch
command.
This approach seems quite efficient, but unfortunately it has a problem. Many operating systems now keep file modification time information at a granularity which is finer than one second. Findutils version 4.3.3 and later will print a fractional part with %A@, but older versions will not.
make
Another tool which often works with timestamps is make
. We can
use find
to generate a Makefile file on the fly and then
use make
to update the timestamps:
makefile=$(mktemp) find subdir \ \( \! -xtype l \) \ -newer timestamp \ -printf "timestamp:: %p\n\ttouch -r %p timestamp\n\n" > "$makefile" make -f "$makefile" rm -f "$makefile"
Unfortunately although the solution above is quite elegant, it fails to cope with white space within file names, and adjusting it to do so would require a rather complex shell script.
We can fix both of these problems (looping and problems with white space), and do things more efficiently too. The following command works with newlines and doesn’t need to sort the list of filenames.
find subdir -newer timestamp -printf "%A@:%p\0" | perl -0 newest.pl | xargs --no-run-if-empty --null --replace \ find {} -maxdepth 0 -newer timestamp -exec touch -r {} timestamp \;
The first find
command generates a list of files which are
newer than the original timestamp file, and prints a list of them with
their timestamps. The newest.pl script simply filters out all
the filenames which have timestamps which are older than whatever the
newest file is:
#! /usr/bin/perl -0 my @newest = (); my $latest_stamp = undef; while (<>) { my ($stamp, $name) = split(/:/); if (!defined($latest_stamp) || ($tstamp > $latest_stamp)) { $latest_stamp = $stamp; @newest = (); } if ($tstamp >= $latest_stamp) { push @newest, $name; } } print join("\0", @newest);
This prints a list of zero or more files, all of which are newer than
the original timestamp file, and which have the same timestamp as each
other, to the nearest second. The second find
command takes
each resulting file one at a time, and if that is newer than the
timestamp file, the timestamp is updated.