Developer Notes: Converting Filenames to Metadata

This past week, we had a need to convert a directory of filesnames into some bare-bones metadata. The filenames themselves contained the metadata we wanted, and generally followed a pattern that looked like this: BoxNameYear-001 — or, the name of a box, a four-digit year, and a unique identifier appended by a dash.

It turned out to be simple enough to solve this with bash. The script I came up with was:

#!/usr/bin/bash 

re='[[:alpha:]]+|[0-9]+'

echo "box,year,no,filetype"
for f in *.tiff; do 
    [[ ! -e $f ]] && continue
    grep -Eo $re <<< "$f" | sed 's/ \n/,/g;/^$/d' | paste -d, - - - -
done

The script takes a directory of files that follows this naming convention and generates a CSV file. We set up the structure with the echo, then loop through the files and look for any file with the appropriate extension (in this case, a .tiff). Then, simply enough, we use grep to match our regular expression against the file name, sed to do some in-line editing and add a comma as a separator to each line, and finally use paste -d to convert the newline output into a single line.

Let’s imagine a set of filenames that look like this:

JoeSmith1950-001.tiff
JoeSmith1950-028.tiff
JoeSmith1950-033.tiff
JoeSmith1950-046.tiff

We can run our script from the command line (sh filemetadata.sh > metadata.csv) and we’ll be left with a CSV file that looks like this:

box,year,no,filetype
JoeSmith,1950,001,tiff
JoeSmith,1950,028,tiff
JoeSmith,1950,033,tiff
JoeSmith,1950,046,tiff

It had been a while since I’d done bash scripting, and I have to say I’m pretty pleased with how simple the solution turned out to be. If you have a need for something similar, some minor tweaks to the script above should get you started as well.