UNIX paste, sed and nl commands

published on 2008-02-27 in computing

3 highly useful commands you will find as part of your standard UNIX toolbox. I'll give an example situation for relevance:

I quite often find myself needing to merge 2 files together for some reason or another. My latest awesomeness consists of ripping/encoding favorite seasons of DVD's I own so my MediaCenter can have an easily accessed library (that I can also stream to my iPod Touch). When the encoding is done, I get files based on the name of the DVD Media and the track number. Like this:

Blah Season 1 Disc 1-1.mp4  
Blah Season 1 Disc 1-2.mp4  
Blah Season 1 Disc 1-3.mp4  
Blah Season 1 Disc 1-4.mp4  
Blah Season 1 Disc 2-1.mp4  
Blah Season 1 Disc 2-2.mp4  
Blah Season 1 Disc 2-3.mp4  
Blah Season 1 Disc 2-4.mp4

I can go to a place like Wikipedia or Amazon and find a list of the track names, that should look like this:

Pilot  
The Fat Man  
Little John  
Howard  
The Reconing  
Half Way  
Blah  
Blah Pt 2

What I want to end up with is files named something like this:

Blah - S01E01 - Pilot.mp4  
Blah - S01E02 - The Fat Man.mp4

It'd be so much easier to rename these on the command line if I could at least partly automate it. Re-typing is a PITA. So, here is how I do it to save a lot of time...

If I assume that they were ripped in order, I can get a listing of the order of the episode files based on the time stamps (ls) and add a " to the front and end of the line (sed commands):

# ls -tr *.mp4 \\  
| sed 's/^/"/g' \\  
| sed 's/$/"/g' > tracklist.txt

Next I go to Amazon, Wikipedia, whatever. Find a list of episodes that is represented with tables (which cut-n-pastes as individual lines with tabs as the delimiter) and paste it into a vi edit session:

# vi episodes.txt

In general, edit this file down to a point that the only thing that exists are the track names, one per line. Let's pretend the first column contained the name of the track, 2nd column the Writer, etc. We only care about the first column, so you can execute this command in vi:

:%s/\\t.*//g

You can do other various cleanup like removing the " character:

:%s/"//g

Clean up other stuff like invalid shell characters, extra spaces, etc. This is the least automated part, but a hell of a lot faster/easier than re-typing. Especially if you are a vi whiz. If you use some other text editor, I'm sure this can be accomplished in a similar fashion.

Now we have 1 file that is the list of mp4 files, in order, and another that is the episode names, in order. What we now need is to make a file with what we actually want the files to be named. I accomplish this with the following command-line awesomeness:

nl -n rz -w 2 -s " - " episodes.txt \\  
| sed "s/^/\\"Blah - S01E/g" \\  
| sed "s/$/.mp4\\"/g" \\  
> newnames.txt

To break that down, this is what is happening:

The nl command adds numbered lines...the -n means "right justified, padded zeros" and the -w means "padded with 2 characters" and the -s means "separate the columns with what's in the quotes".
The 2 sed commands add a " to the beginning and end of the lines.
The > sends the output to a file.

Output looks like this:

"Blah - S01E01 - Pilot.mp4"  
"Blah - S01E02 - The Fat Man.mp4"  
"Blah - S01E03 - Little John.mp4"  
"Blah - S01E04 - Howard.mp4"  
"Blah - S01E05 - The Reconing.mp4"  
"Blah - S01E06 - Half Way.mp4"  
"Blah - S01E07 - Blah.mp4"  
"Blah - S01E08 - Blah Pt 2.mp4"

Now we merge the 2 files and prepend the 'mv' command to get a script we can run:

paste tracklist.txt newname.txt \\  
| sed 's/^/mv /g' \\  
> script.sh

Which looks like this:

mv "Blah Season 1 Disc 1-1.mp4" "Blah - S01E01 - Pilot.mp4"  
mv "Blah Season 1 Disc 1-2.mp4" "Blah - S01E02 - The Fat Man.mp4"  
mv "Blah Season 1 Disc 1-3.mp4" "Blah - S01E03 - Little John.mp4"  
mv "Blah Season 1 Disc 1-4.mp4" "Blah - S01E04 - Howard.mp4"  
mv "Blah Season 1 Disc 2-1.mp4" "Blah - S01E05 - The Reconing.mp4"  
mv "Blah Season 1 Disc 2-2.mp4" "Blah - S01E06 - Half Way.mp4"  
mv "Blah Season 1 Disc 2-3.mp4" "Blah - S01E07 - Blah.mp4"  
mv "Blah Season 1 Disc 2-4.mp4" "Blah - S01E08 - Blah Pt 2.mp4"

Check the script for sanity, then run it!

bash -x script.sh

w00t!

If you want the shell script I use to automate this somewhat:

#! /bin/sh

ls -tr *.mp4 \\  
| sed 's/^/"/g' \\  
| sed 's/$/"/g' \\  
> tracklist.txt  
nl -n rz -w 2 -s " - " episodes.txt \\  
| sed "s/^/\\"$1 - $2E/g" \\  
| sed "s/$/.mp4\\"/g" \\  
> newnames.txt  
paste tracklist.txt newnames.txt \\  
| sed 's/^/mv /g' \\  
> script.sh

The arguments are the name of the series and the season, like this:

bash ./rename.sh Blah S01

P.S. If you're trying to guess the show by the track names, I made them up. :)

Tags: howto sed unix