Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Jan 8, 2017 17:24:17 GMT
Let's face it. Most Part 15 broadcasters operate on a shoestring budget, and generally have a staff of exactly one. Our radio stations are also not usually our day jobs.
That severely limits the ability to do live radio, which of course, is preferred most of the time. We then, like even many professional, licensed stations, have to automate our programming.
If you want to somehow announce the songs that you are going to and/or just have played, you have to come up with some sort of automated voice over system. Announcing song names/artists, etc. is much more meaningful to the listener than just playing straight music, with no ID's. If you're streaming, you can send metadata down the Internet pipe that will show up on your player, but that, of course, doesn't work for over-the-air broadcasting.
There are basically two ways to solve the voice over problem:
1. Record ID's manually and then somehow associate them with the song (i.e., manually append or prepend them to the song file, if you're using Zara create a .seq file which contains the song and the ID separately, etc. - there are many ways, depending your preferences, and the automation software that you're using).
2. Use text to speech software.
#1 can be useful if your playlists are small, and relatively static. If you have a radio voice, they will also sound the most natural and pleasing. But most Part 15 broadcasters pride themselves on providing alternative programming, which generally means larger playlists, changing often. #2 in those cases, properly executed, is the most flexible and least time consuming method.
For years I have automated Artisan Radio's voice overs using text to speech (TTS) software. What I intend on doing over a series of posts is to describe how we (I) have accomplished this, and to even share the scripting code, mainly batch files, that have been developed. But more importantly, I want to focus on the methodology (i.e., the whys), so that others who may not have the same computer platform, or run the same software, can duplicate these efforts.
The following is the general methodology used - each post will go into more depth on one particular step:
1. Create the text for each song that will eventually be transformed into speech and store it somewhere, associated with the song
2. Choose text to speech software, and even more importantly, a natural sounding voice that the software supports
3. Create a scheduling environment such that the text to speech software is run while each song is being played through the automation software. This is a lot trickier than it sounds, particularly for automation software such as Zara, as the TTS has to run outside the Zara environment, creating numerous race conditions
4. Play the TTS generated ID.
5. Repeat #4 & #5.
Here's a peek at the next post. I use the file name of each song as the database for the ID's - in other words, the file name of the song being played is run through the TTS and used as the voiceover (song ID). More coming...
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Jan 8, 2017 17:37:42 GMT
Signing Up for This Class
Glad you are posting this presentation, DavidC.
I am one of those 1-man stations you are talking about, and have long been curious about text-to speech software but have never researched it.
Stations that have some kind of voice presence are more likely to attract regular listeners, I strongly believe.
My notebook is open and pencils sharpened.
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Jan 8, 2017 19:55:30 GMT
I've been attempting to complete a white paper on the subject for some time, but never seem to get sufficient dedicated time to do it. That's why I've decided to use this approach. Some parts will be cut and paste from what I have managed to write so far.
So, without further ado, here is the next post - Creating the Voice ID Text.
The text to speech software (TTS) has to run outside the environment of whatever automation software that you're using. In my case, that is Zara Radio. But the same principles apply for any automation software.
You'll need to, for each song, decide what you want the voice over to be (/say). And then store it in some fashion that can be associated with the song it is ID'ing.
My voice overs depend to a large extent on the information about each song that I have available to me. In addition to that, I try to make them consistent across songs in each playlist.
So, for example, when I play obscure 50s/60s pop, the ID's generally look like <Artist Name> <pause> <Song Name> or "Lou Monte; Dominick the Italian Christmas Donkey".
When I play vintage jazz, they look like <Artist Name> <pause> <Song Name> <pause> <Date of Session> <pause> <Location of Session>, or "Louis Armstrong & His Hot Five; I'm Not Rough. December, 1927; Chicago".
Hopefully the general idea is clear. In the examples that I've given, the punctuation is how you control pause lengths for the specific TTS software I use (more on that in future posts).
Now, there are many ways to store that voice over text. I believe that simple is ALWAYS elegant, and I use the song file name (which, conveniently, Zara provides external program access to in the file currentsong.txt. Depending on how you decide to implement processes such as scheduling, you could put the voice over text in song metadata (i.e., in one of the mp3 tags), or even in an external database.
Each method will have its challenges. File names generally have restrictions as to the characters that can be used, and even length (although I've never run into length problems using Windows, I try to keep the voice overs as short as possible, while containing as much relevant information as possible). There are still restrictions in the content of an mp3 tag (length & characters used), but they're less than those of a file name. Database contents would be the most flexible.
However, another challenge is accessibility - how easy is it to get access to the voice over text from an external program (and preferably batch scripts as opposed to writing something in C++ or C). File names are extremely easy in Zara, and usually very easy even with other automation software. Metadata is never quite as easy to extract in an automated fashion as a plain file name, and accessing databases can be even more complex.
Another advantage of file names is that they're easy to change. If, for example, you would like to change the sound of your station voice overs, and add more or less information, just change the file names of the songs in your playlist. IF you keep the naming consistent, that can quickly be done with batch file renaming software utilities (I learned the hard way that consistency is good - it's not much fun changing hundreds or thousands of somewhat randomly named files manually). You can of course change mp3 ID's but most programs require you to do that on an individual file basis [please correct me if I'm wrong there, I'm not a big fan of metadata]. And depending on the database and how you've set it up, changing things can either be very easy, or very difficult.
Once you've gotten your voice over text planned, then you're ready for considering the next, perhaps most important step. That is - choosing your TTS, and more importantly, a natural voice that the TTS uses to generate the voice over.
I use Alive Media Text to Speech (purchased), which I'm not sure exists any more. And I use Neospeech Kate (purchased) as the voice. Both were relatively inexpensive when I purchased them (for personal use - commercial use would have been much more expensive). I'll go into this topic in much more detail in my next post (probably in a day or two).
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Jan 12, 2017 4:47:01 GMT
Part 3 - TTS Software There are all kinds of TTS software on the market, from free to pretty expensive. The basic functionality of ALL this software is similar - to take text and convert it into speech. It's the surrounding support functionality that you will have to evaluate to make your decision on which to choose. First and foremost, because the TTS software will run outside your automation software, and in the case of Zara, has to be run asynchronously (i.e., not controlled by the Zara playlist) you will need a package that supports some sort of batch automation/command line interface. Not all have this. In particular, you need to be able to launch the TTS software with your voice over text as input, and get an audio file in a predetermined location as output. It is this output audio file that is then played within your automation software. Before I go any further, all TTS packages need a 'voice' - this is a file, in a standard format, that determines the quality of the audio output that the package generates. The higher the quality, the larger the file (and the more expensive the cost to purchase one). Microsoft 'SAM' is an example of a voice, and it is distributed free with the SAPI component of Windows (along with a few others). While it's convenient, and inexpensive, it's also very low quality (as are the others). Audio output using this voice sounds very robotic. It's actually pretty difficult to find a good voice file. There are plenty of free ones for you to consider - here is a link to a source: www.zero2000.com/free-text-to-speech-natural-voices.htmlThere are also lots that cost money. Virtually all suppliers let you try out their 'voices' from their web site - I recommend that you do just that until you find one at a cost that you are willing to pay, and that has acceptable quality. I use Neospeech Kate that I purchased a license for a long time ago. As a side note, I've found that female voices tend to sound less robotic than the male ones (both with equivalent file sizes), so I would check those out first. As you play around with voices, you'll find that you occasionally (and sometimes not so occasionally, depending on the voice) get incorrect pronunciation. There are several ways around this problem - the simplest approach is to rely on phonetics, as opposed to spelling, in your text. As an example, the group 'Dion & the Belmonts' could come out sounding like 'Die-on & the Belmonts'. If that happens, one solution would be to change the text to 'Deon & the Belmonts' to get the correct pronunciation. Unfortunately, if you are also sending out the text as meta data over an internet stream (as I do), the incorrect spelling shows up on the listener's media player as well. Some of the better (and expensve) TTS packages can 'learn' the pronunciation of certain words & abbreviations that might otherwise be mispronounced (through the use of linguistic dictionaries). That's probably beyond the scope of most Part 15 broadcasters. I just accept some mispronunciations, although I do avoid the use of abbreviations. One of the most valuable tools when using TTS is the pause. Putting pauses in key places in your voice over audio really helps the listener, particularly when the quality of said audio is questionable. I put as many pauses as I can into the text - the actual mechanism is a function of your TTS software, but it's usually punctuation. Again, you'll have to play around with commas, semi-colons, periods, etc. to get the lengths of pauses that you want. Some TTS engines can even add emphasis for exclamation or question marks. One of the restrictions of using file names (as I do) for the voice over text is that question marks are not valid in a file name (although exclamation marks are). I use a paid-for TTS package called Alive Text to Speech by Alive Media. It appears that they are still around - they allow you to download a demo for trial. This software gives me everything I need, but since it's been around a while, there may be better, more modern packages, out there, and it never hurts to look. Once you've chosen your TTS and your voice, you're ready to start. In the next post, I will be discussing the launching and scheduling of the TTS software. We'll look at a simple way on how to do that within Windows, while running Zara, using a batch scheduling program and semaphores (a way to control whether or not to generate the voice over output audio). We'll also look at the various race conditions that can occur during this asynchronous processing. I'll provide code samples of the batch files that I use, but we'll also look at alternate methods to achieve similar functionality (such as developing a C++ or C application, which would be far more flexible).
|
|
|
Post by End80 on Jan 12, 2017 22:18:14 GMT
Very interesting topic David, looking forward to your next installment.
On a side note,.. Several years ago I received a few requested announcements and voiceovers at RadioDaddy from their free request section.. and not wanting to just be someone just taking but not giving, I worked on some other of the request. But since I don't have a 'radio-quality voice', I responded to some of the scripted request and created several 30 and 60 second clips, using only text-to-speech, music, sound effects, and Audacity... The results sounded good and the people who had requested them were very enthused and satisfied with what I had made for them.
This just to emphasize that the quality of the fake voices makes all the difference. The female voice do sound more realistic.
|
|
|
Post by mark on Jan 12, 2017 23:28:58 GMT
I've often thought to do this than use my own voice to do the station IDs etc but haven't yet bothered. Kinda like hearing myself but came across a text to voice site where you can pick a certain female voice, say American English and with settings alter the harmonics to make her sound breathy or sexy, talk faster or slower, change expressions etc.,...may sound good. This site has a 1 month free trial and I could do everything I want in a few days and pay nothing. To get it in a MP3 file I could just record it in high quality on the voice recorder in MP3 files to get on my playlist. But as I said I like hearing myself and I don't sound that bad but a nice sexy female voice would be good.
Mark
|
|
|
Post by End80 on Jan 13, 2017 1:34:51 GMT
<img alt="Video Preview" src="//storage.proboards.com/forum/images/bbcode/video-preview.png" video='<iframe width="560" height="315" src="https://www.youtube.com/YK7I-QH1GMc" '>
Hmmm.. couldn't get the video embed to work
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Jan 13, 2017 1:50:32 GMT
At the End of This Seminar I Hope To Select a Voice for KDX
I can sit here and record voice tracks all day long but it's always me and I'd like to have other voices on KDX. This "How To Tutorial" is a very welcome opportunity to learn about what's available.
End80 I enjoyed your YouTube link. She sounded good enough to do a few spots every once in awhile.
I learned how to use the Zara "Sweepers" tool to overlay a voice over another audiofile and if no one else covers that particular option I'll submit a description later on during this thread.
|
|
|
Post by bluebucketradio on Jan 17, 2017 1:43:47 GMT
Great Tutorial and on a side note but still related to the topic of TTS, The National Weather System had recently launched a new voice for NOAA Weather Radio that sounds very natural until it has to use certain words. One particular word is Winds, the plural of Wind. The text to speech program they use wants to pronounce the word Winds in this fashion;
The boy winds up the toy robot
But has no problem with the word Wind ;
The Wind will be out of the South at 9 Mph
Though my wife and i have found this to be humorous, some folks might tune out depending on their ability to decipher this slight discrepancy. Having dealt with and heard the many mis-pronounced words for several years over NOAA Weather Radio, this isn't a problem for me.
The female voice NWS uses sometimes says, "Stay Away From Win-dows"
Barry
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Jan 17, 2017 1:56:00 GMT
Teaching an Artificial Voice
Earlier DavidC mentioned that sometimes phonetic spellings can trick an artificial voice.
Barry's NOAA voice has a problem: "One particular word is Winds, the plural of Wind."
Since the singular "wind" gets said properly, maybe "wind-zz" or "windz" might work.
But of course if that also gets printed as text on the screen we'd be stuck.
|
|
|
Post by Admin on Jan 18, 2017 22:37:44 GMT
I've also tried Zara Sweepers. It works but a bit time consuming to setup as you need a voiceover for each song played unless you do blocks of songs.
Johny C would be the guy to check with.
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Jan 19, 2017 2:26:36 GMT
The next installment is coming but I'm laid low with a nasty flu bug. Hopefully I can get to it in the next few days
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Jan 19, 2017 2:47:23 GMT
While the Flu Bug Bites
The artificial voices are in no hurry to say anything.
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Feb 8, 2017 23:22:10 GMT
I've finally gotten over the flu, which invaded various parts of my body (probably already too much information), so, without further ado, I'm now going to continue with Automating Voice Overs.
Keep It Simple
I'm a great believer in keeping things as simple as possible, while at the same time ensuring that you're doing what you want to do. It's difficult enough to get the simple stuff right with computers, never mind the complex stuff.
Choosing to have the voice over text as the sound file file name really simplifies things. All you have to do to generate a voice over mp3 file is to call, from the command line, with the TTS software that you have selected. That can be done within a batch file, which is probably the simplest Windows program you can create.
Any other choice would likely mean that you would have to manipulate strings, requiring the writing of a program in a lower level language.
Back when I was operating on Bowen Island, my station used an automated process to report on the status of the Bowen Queen, the BC Ferries link between the island and the mainland. I used a screen scraper to get the data into a C program, which then manipulated said data and called the TTS software to generate the appropriate voice over mp3 file. But programs take quite a while to write, are relatively difficult to exhaustively test (which is why so many applications are buggy), and require periodic maintenance. If you don't have to do it, why bother?
Here is a sample call to the TTS software from the batch file that I use, speech.bat:
"c:\program files\alivemedia\text to speech\texttospeech.exe" "c:\Program files\Zarasoft\Zararadio\currentsong.txt" /mp3 "/c:\ArtisanRadio\CurrentSong"
The quotes are necessary if there are spaces in the command (i.e., the folder name Program Files.
This line tells Alive TextToSpeech to generate an mp3 file from the contents of currentsong.txt, and place it in the ArtisanRadio\CurrentSong directory.
Asynchronous Processing
OK, so we want to launch this batch file to generate the voice over mp3 file. But how should we do that?
Zara can launch programs, so it would be possible to put such a launch in a playlist. But if you launch this batch file before you play a song (particularly a randomized folder), the file name of the song that will be played may not be in currentsong.txt. Zara does not wait for the launched program to complete, and goes on to do the next thing in the playlist. You're dependent on the multi-tasking vagaries of Windows as to whether Zara starts playing the song before the batch file starts processing. You may end up with the previous song played, or the current song - it would be random.
So that's not going to work.
You could also put the batch file in the playlist after a song has been played. After that, you would play the voice over. But the problem again is, Zara doesn't wait after the launch, so the voice over may or may not be the correct one.
The only solution is to take the voice over generation outside of Zara. Here's how I do it.
I use something called SSFree (System Scheduler Free), which, as the name implies, is freeware. In keeping with my keep it simple and good enough philosophy, it allows you to schedule tasks down to the granular level of one minute. So as long as your programming is one minute in length or more, it will work just fine. There are probably others that will schedule to seconds, but for my use, it was just not necessary.
Using SSFree, I create one application that launches every minute. It calls the TTS software to generate the voice over.
But wait, lots of programming is more than one minute in length, and so generating the voice over multiple times is not necessary, and in fact, takes a lot of CPU cycles. It would be better to check to see if it was necessary to create the voice over. We do this with what is known as a semaphore. "In computer science, a semaphore is a variable or abstract data type that is used to control access to a common resource by multiple processes in a concurrent system such as a multiprogramming operating system."
We're going to keep the semaphore as simple as possible - in this case, it's the actual voice over mp3 file. If it doesn't exist, we have to create it. If it already exists, we don't.
To accomplish this, we have to do several things. First, here is a simple, sample Zara playlist file:
deletesong c:\PublicDomainJazz c:\ArtisanRadio\CurrentSong\currentsong.mp3
deletesong is a batch file that deletes a voice over file, if it exists:
del "c:\ArtisanRadio\CurrentSong\currentsong.mp3"
If it doesn't exist, nothing of course happens.
A random song from a folder is played. At some point, while this song is being played, SSFree will fire off the batch file to convert the contents of currentsong.txt and create currentsong.mp3. Immediately after that has completed, the voice over is played, and we then go back to the top of the playlist, where the voice over file is deleted, and the process repeated.
To stop the TTS batch file unnecessarily generating voice over files, we modify speech.bat to:
if not exist "c:\ArtisanRadio\CurrentSong\currentsong.mp3" "c:\program files\alivemedia\text to speech\texttospeech.exe" "c:\Program files\Zarasoft\Zararadio\currentsong.txt" /mp3 "/c:\ArtisanRadio\CurrentSong"
Race Conditions
As you can see, the basic concepts of this approach are very simple. But there's a penalty to be paid for simplicity, particularly in a multi-tasking environment such as Windows.
If you implement your automated voice overs exactly as described, occasionally you will notice that the voice over consists of "deletesong". That is because SSFree fires off every minute, and it literally could be at any point in your playlist execution. So, for example, if speech.bat runs while deletesong is in the process of deleting the last voice over, currentsong.txt will actually contain deletesong. So even if it fires again (and it will) once you're playing the 'real' song, because the voice over already exists, it won't get overwritten.
In multi-tasking environments, when you're doing asynchronous processing, you have to do your due diligence and ask yourself, what if speech.bat runs at this point? Or here? And ensure that you're going to get the correct behavior.
Fortunately, this is really the only race condition that exists with this system (at least that I've discovered), and it is easily handled by the following addition to speech.bat:
if exist "c:\ArtisanRadio\CurrentSong\currentsong.mp3" exit set/p song=<"c:\program files (x86)\Zarasoft\Zararadio\currentsong.txt" if "%song%"=="deletesong" exit "c:\program files (x86)\alivemedia\text to speech\texttospeech.exe" "c:\program files (x86)\Zarasoft\Zararadio\currentsong.txt" /mp3 "/c:\ArtisanRadio\CurrentSong"
If the voice over file exists, exit (i.e., don't generate it). Otherwise, get the contents of the currentsong.txt file and if it is "deletesong" exit. The next time SSFree fires off speech, the correct voice over will be generated.
Other Considerations
For more complex Zara playlists, you may have to make some changes to the batch files, or your events/playlist. As an example, I had (and may reinstitute) station ID's, commercials, etc. every 15 minutes, played by the event system of Zara. That can cause strange results to occur - you'll play a song, have a station ID, and then play the voice over (out of immediate context of the song, even with short ID's). To get around this problem, I created another event, to play one second after the ID/commercial (thus ensuring that it will be queued up and launched by Zara before going back to the main playlist) to delete the 'hanging' voice over file. That had the net effect that the song would not have a voice over, but I felt that that was the best solution.
One of the reasons I've spent the time explaining why I did things the way I did was to allow interested broadcasters to modify this approach as they see fit.
There is also no doubt that writing a program in C or C++ to handle more complex situations would be the most flexible way to go. I may still end up doing that, depending on what the future holds for my little radio station.
If I do that, I will certainly share that program here.
|
|
|
Post by End80 on Feb 9, 2017 1:51:13 GMT
Holy crap how long did it take you to figure that out? While I don't have a full grasp on how your doing it, and am not currently in a position to experiment with what you describe, I do see where your going.
|
|