Interestingly, I found out later, that the synchronised navigation of time-based media and text, had been written about, but nothing had been built, in the 1990s. We had two modes - ASR, where the time codes were generated with the recognition output, and alignment, where the text was provided and aligned to the media. We built some amazing prototypes - karaoke-style audiobooks, where you could pause, and click on any word to hear it etc etc. However search was the biggest thing. Tape and then digital increased the rushes to output ratio massively. On 16mm or 35mm, a 1-hour documentary would be edited from maybe 10 or 20 hours of footage. Once the medium was effectively limitless, you might be trying to make a 30-minute programme from 200 hours of actuality and interviews, shot by an army of juniors.
Our classic use case for demos was, "Great, but find me that great bit where the crazy guy is talking about cheese".
At a European broadcasting conference, when I was announcing the death of metadata (in favour of the content in itself), a guy with a Family Guy Scandinavian accent, in the audience - totally unprompted - shouted, "Yeah man, who the fuck needs metadata, just give me my stuff".
My speech followed some poor person announcing the BBC's "SMEF" metadata format, the entity diagram for which covered an entire wall, the last time a saw it.