A few months ago, I was contacted by a professor, Dr. Jeff Larsen from the psychology department at The University of Tennessee, about a simple automation project he had in mind. In this series of posts (and as the project progresses), I’ll share some of my experiences with this side venture.
First off, given the small scope of the project, I’m choosing not to accept any compensation. My thoughts were that there were multiple “wins” here:
- Dr. Larsen was totally cool with this being an open-source project, meaning other researchers using a similar process can benefit as well.
- I can have a side project for my portfolio.
- This could be the start of establishing healthy relationships between industry (read non-academic) software folks and the university. (Sadly, Dr. Larsen posted on a campus forum to elicit some help from undergrads in the computer science department and got no response.)
Also, accepting money for freelance software work can be tricky, so I decided to dodge the issue all together.
What’s the project about?
The project involves an experimenter showing a subject various stimuli, usually in the form of video clips. The department already has a software package, AcqKnowledge, that records video of the subject and knows precisely when the stimulus was shown. Overall, the project aims to study facial responses.
Once a session is complete, the software makes two things available:
- Subject video (could be as short as 10 or as long as 40 minutes)
- Event list, giving the start and end times of a given trigger
The current process involves more manual labor of editing the video:
- Removing the “dead time” where the experimenter is explaining something to the subject
- Isolating each of the segments where a trigger was given so that they can be studied in isolation
The video capture already takes place on a Windows PC, so I chose to stay in my comfort zone of .NET (C# and WPF). The workflow is pretty simple:
- Choose a subject video
- Choose an event list file
- Choose a folder to save the segments
- Run the splitting operation
For video processing, I chose FFmpeg, which provides a command-line tool that does all the heavy lifting.
Minimum viable product
At least in the software world (and especially in the startup world), having the minimum viable product (MVP) is the first goal. It aims to answer the question, “Am I building the right thing?” It also helps keep scope in check. For example, there’s no need to build a cloud-based Web app with responsive design running on Node.js with a MongoDB backend when all the customer needed was an Excel worksheet to create a simple chore tracker for her kids.
What was my MVP? A batch file and a hand-drawn sketch of the user interface.
Because the videos could be fairly lengthy, I took an M4V file from an RPM masterclass and made the individual ffmpeg.exe calls to split it into separate videos per teaching block. This forced me to play around with the various ways of splitting the larger video:
- Should I use “from” and “to” timestamps?
- Should I use a starting point, then advance a certain duration?
- What about different audio/video codecs?
- Are the segments starting/stopping at the correct time?
- Is the audio in sync with the video?
- Does the video play correctly?
After some experimentation and Google searches, I arrived at a solution that split a 47-minute video into 11 segments in about 15 seconds on my Windows 7 Intel i7 laptop (running on battery power).
echo %TIME% ffmpeg -ss 00:00:00 -i rpm64.m4v -vcodec copy -acodec copy -t 00:00:30 segments/sizzler.m4v ffmpeg -ss 00:00:30 -i rpm64.m4v -vcodec copy -acodec copy -t 00:01:06 segments/intro.m4v ffmpeg -ss 00:01:36 -i rpm64.m4v -vcodec copy -acodec copy -t 00:05:38 segments/track1.m4v ffmpeg -ss 00:07:14 -i rpm64.m4v -vcodec copy -acodec copy -t 00:05:02 segments/track2.m4v ffmpeg -ss 00:12:16 -i rpm64.m4v -vcodec copy -acodec copy -t 00:05:31 segments/track3.m4v ffmpeg -ss 00:17:47 -i rpm64.m4v -vcodec copy -acodec copy -t 00:05:04 segments/track4.m4v ffmpeg -ss 00:22:51 -i rpm64.m4v -vcodec copy -acodec copy -t 00:06:19 segments/track5.m4v ffmpeg -ss 00:29:12 -i rpm64.m4v -vcodec copy -acodec copy -t 00:06:07 segments/track6.m4v ffmpeg -ss 00:35:19 -i rpm64.m4v -vcodec copy -acodec copy -t 00:06:40 segments/track7.m4v ffmpeg -ss 00:41:58 -i rpm64.m4v -vcodec copy -acodec copy -t 00:03:30 segments/track8.m4v ffmpeg -ss 00:45:28 -i rpm64.m4v -vcodec copy -acodec copy -t 00:00:47 segments/outro.m4v echo %TIME%
Dr. Larsen was very pleased with the performance during the demo.
We software builders love to over-engineer, so I wanted to make this as simple as possible: Find the inputs, then click a big “DO STUFF” button.
Because it’s analog and a little unclean, it facilitates discussion about whether the workflow jibes with this interface. Now is not the time to be picky about fonts, colors, etc., so keep the conversion flowing by using low-fidelity visuals.
I’m a fan of domain-driven design, and requirements gathering is a great place to tease out the ubiquitous language. When both the developer and the domain expert use the same terms, communication gets less ambiguous. Coming up with labels for the user interface forced me to ask some questions:
- What do you call the input video file?
- Are these entries called events or triggers?
- What do you call the segmented videos?
Where to go from here
For the next segment of work, I’ll implement the UI and get some MVVM structure in place. This is also a good chance for me to practice test-driven development (TDD) on a greenfield project. Given I’ve got some dependencies on the file system and an external tool, this is a great project to practice coding to an interface and using a mocking framework to help with testing.