# Plugin to transcribe wav file to captions



## Linwood Ferguson (Jan 31, 2016)

Has anyone ever seen or heard of plugs that might transcribe voice wav files into metadata captions?

It's very nice that I can record a comment in camera (e.g. "That was the 3rd out of the 5th inning") but then I have to notice them, play them back, and type them in.

While none of them are perfect it would be nice to just get a close first cut, it would be easy enough to review the captions.

Some searching did not turn up anything.


----------



## PhilBurton (Jan 31, 2016)

Ferguson said:


> Has anyone ever seen or heard of plugs that might transcribe voice wav files into metadata captions?
> 
> It's very nice that I can record a comment in camera (e.g. "That was the 3rd out of the 5th inning") but then I have to notice them, play them back, and type them in.
> 
> ...



I too would like that capability in a low-cost plugin.

There is software that does speech to text, but it is frightfully expensive, e.g. Dragon Naturally Speaking (pro editions).

Phil


----------



## Linwood Ferguson (Jan 31, 2016)

PhilBurton said:


> There is software that does speech to text, but it is frightfully expensive, e.g. Dragon Naturally Speaking (pro editions).



Yes, but times have changed a lot since they were the only thing going.  There ar a lot of services, IBM's new "Watson" does a thousand minutes a month free, I think Google has a free one (or did, not well documented), a quick search turned up lots of services though several were vague about charges.

I suspect (emphasis on suspect) that the issue might be that each user of such a plugin would need to set up the transcription account themselves at some of these (e.g. Watson, since it is limited it has to know who is using it).   But that might not be too big a deal.

.... he said, while obviously not writing his own, and hoping someone else would do the hard work.


----------



## PhilBurton (Jan 31, 2016)

Ferguson said:


> Yes, but times have changed a lot since they were the only thing going.  There ar a lot of services, IBM's new "Watson" does a thousand minutes a month free, I think Google has a free one (or did, not well documented), a quick search turned up lots of services though several were vague about charges.
> 
> I suspect (emphasis on suspect) that the issue might be that each user of such a plugin would need to set up the transcription account themselves at some of these (e.g. Watson, since it is limited it has to know who is using it).   But that might not be too big a deal.
> 
> .... he said, while obviously not writing his own, and hoping someone else would do the hard work.



Ferguson,

Thanks for this info about Watson. I sometimes do interviews as part of my job and I have to do transcriptions myself of the recordings. 

I certainly don't want to do this plugin myself either  Speech to text is still not as simple as doing a text editor.

Phil


----------



## rob211 (Feb 6, 2016)

Sorry, but speech recognition is pretty tough and far beyond the capabilities of a simple plugin.

Indeed, Adobe has tried transcription in Soundbooth and Premiere, and I think both have bit the dust. In my work I needed to get tons of stuff transcribed, and the accuracy of even pretty sophisticated and powerful software just isn't there. Software that interacts, like dicatation, is better, but mostly because it's trainable and interactive. And commands are easier too. But poor quality audio into a camera? odds are you'd get so many errors it wouldn't be worth it. Not to mention names and other proper nouns. "Samardzija pitching to..." or "Krzyzewski coaching..." would be kinda tough.

If you could consolidate the audio, there are transcription services online. We used 'em all the time, but I can't recall the cost. You could get transcript time stamped, which would aid in matching up with photos.

I dunno if it will help you, but I use notetaking apps on my phone to aid in captioning. You can add audio to a note with some text, GPS, a photo, etc in Evernote, or use handwriting to scrawl a note that Evernote can OCR to get text you can copy and paste.


----------



## Linwood Ferguson (Feb 7, 2016)

Rob211, I guess now I need to give it a try. 

Bear in mind I am not suggesting that the plugin would do the translation, but instead there are web services out there which are far beyond "simple".  Kind of like mapping -- Lightroom doesn't do maps, it's all web based, same with address/city lookups, etc. 

I think it depends on what one expects.  Think of Google Voice Mail... it's never perfect, but it is mostly understandable.  And like that, if lightroom gave me text that didn't make sense or jar my memory, I hit the replay button and listen and do my own transcription.

But I agree, names will likely never work.  Even if it could do them well, I can't pronounce many of the names (even if I could remember them).  For me the most I record is jersy numbers if not obvious, but I can see others expecting it to do more.  If I want names I'll use a code replacement table later and put in FG2 when I want "Samardzija Krzyzewski on base".

But I'm curious; I find the audio quite good on my D4, since my mouth is usualy 3" from the mic.  I'll take some WAV files and find a free service and see what it does with real examples from the past (so I won't be tempted to speak more clearly in a test!)


----------



## rob211 (Feb 8, 2016)

Google Voice transcription is amazing good. You could perhaps try SpeechLogger; it uses Google's engine I think. I tried it once and it was quite good. It runs through Chrome, but I think you can play an audio file via line in or Soundflower or something.

And you still have to proof everything if accuracy is at all important. One digit off in a number and you are hosed, memory or not. So you have to play it back anyway. I find it easier to use a transcription application (One Note used to be great for this, but I don't know if it still is) so I can review it all for accuracy and get timecodes. I dunno if your camera adds time to the audio, but that helps. Good luck.


----------



## Linwood Ferguson (Feb 8, 2016)

rob211 said:


> Google Voice transcription is amazing good. You could perhaps try SpeechLogger; it uses Google's engine I think. I tried it once and it was quite good. It runs through Chrome, but I think you can play an audio file via line in or Soundflower or something.


Thanks.


rob211 said:


> And you still have to proof everything if accuracy is at all important. One digit off in a number and you are hosed, memory or not. So you have to play it back anyway. I find it easier to use a transcription application (One Note used to be great for this, but I don't know if it still is) so I can review it all for accuracy and get timecodes. I dunno if your camera adds time to the audio, but that helps. Good luck.


The good news is about 80% of the time numbers are visible, or I can figure it out from context.  Most of what I cannot remember by the time I'm post processing are things like "was that a hit" or what inning it was in, or as an example the other night, a key basketball player hit 1000 points and they announced it.  Now in that case they were REAL slow and I couldn't tell which shot it was anyway, but frequently things like that I know when they are happening, and want to "mark" the shot if I get it.

I don't try to annotate every shot.  Most of what I'm doing is just game highlights that the schools use themselves and figure out what they want to say about them (or put another way, they are going to ignore my caption regardless).   And I'll never make a good old-school photo journalist, who documents each shot carefully.

What got me thinking of this was getting a new sports body.  Right now when shooting, only one body has a recorder, so it is only sort of a half effort if I record notes, and do not do many.  But when both have them... maybe I can do better at captions.

But yes... long way of agreeing I'd have to proof everything.  I just think it would be easier to proof some even just fair transcription than listing to a lot of audio and doing my own.  But maybe not -- maybe I just ought to make a transcription pass myself.


----------



## rob211 (Feb 9, 2016)

Not many cameras have audio notes these days. But there are lots of good audio recorders out there, assuming you don't wanna use a phone. The ones made for video are especially good, and even if you don't use video the time synching methods would be the same. These are used all the time with multiple video sources, so it shouldn't be difficult to use one with multiple still cameras. The H1 for example.


----------



## rob211 (Feb 11, 2016)

Here's something that may work: *Google Keep*. It instantly (like scary fast) transcribes voice notes. It was amazingly accurate when I tried it. Works on iOS as an app, and then you can access the notes and text of the notes on the web via Chrome (and probably Safari). Can also take pictures, and send the stuff to Google Docs if you need formatting, etc. Free.


----------

