Business Musings: The Research Part One (AI Audio)
I have finally found the time to research AI audio. Or rather, I’m finally making the time.
I have ventured to this precipice before, and backed off. Now, though, I feel the need to move forward. We (me, Dean, WMG Publishing) are leaving a lot of money on the table. I haven’t been able to get moving on my own voiceover, which would require me to either record out of the condo or to rearrange our place to make it soundproof. (Not something I’m really willing to do, even though we do have one spot I can use.)
I have investigated hiring voice actors. The ones who are already on my various series are too expensive, even at the marvelous discounts they offered me as the author.
I have done a quick examination of the math. The numbers I used were for the existing series as they sold on Audible. Regular Audible, over time, not ACX Audible.
Using those numbers, it would take two to five years to break even on the professional voice actor performances. I’d love to use those folks, since voice actors bring their own audiences to audio. If I go with a different voice on a series, for example, I will lose the voice actor’s fanbase.
It really is in my best interest to keep the original actor…if I could afford them. Which I can’t. The numbers don’t justify it.
However, I’m receiving at least a letter per week from someone asking for audio versions of my books. Some of these folks are longtime fans who now have reading issues. Others are folks who listened to earlier books in a series while making a commute.
Working in the office is back, baby, and that means that commuting is back, which means that listening to audiobooks in the car is back. People are extremely disappointed to learn that they can’t get the next book in a series or even two or three books (considering they haven’t been buying for a couple of years).
I understand. Which is why I’ve been noodling at audio for years now. Ten years or so ago, we opened our own audio studio at WMG, but hired someone who didn’t do as asked. I wanted us to use our resources to produce books. The employee instead defaulted to ACX, which was new at the time.
I was getting sicker and sicker, and that was a fight I couldn’t handle. Finally, we shut down our studio and moved on. I regret this more than I can say. Ten years on, the audio distribution systems are good enough to handle what we would have produced. We are transferring what few audio products that came out of our studio onto our bookstore. The other products are, frankly, a mess, and we need to resolve that.
Then I entered the local voiceover community here in Las Vegas. I did so for two reasons. My main reason was to listen to the other voicers and see if any of them could handle one of my series. As I looked at the pricing, though, I realized I couldn’t afford them…and they didn’t need to practice on my book.
The professional voicer community is like WMG’s professional writer community. Those voicers are being trained to charge for their work, and to do so properly. I wasn’t going to mess with that.
The other reason I went was to improve my own rusty skills. I’m going to keep doing that, with a return planned for this summer. I will eventually narrate some of my own projects. Those will be fiction projects and I’m looking forward to it.
I had thought about hiring nascent talent from the University of Las Vegas-Nevada. I now know how to do that, and I can probably do it for free. I might do that with some radio plays. I’d work with the drama department for the proper permissions.
All of those ideas are firmly based in what I’ve known about audio from my career in radio and from my years working with it on the side. That knowledge is anywhere from 20 to 40 years old. Those systems still work in the modern era, but they aren’t the totality of the picture.
The actual voices in my head come from readers/listeners who want the next book. Listening to audiobooks has changed in the intervening decade.
Once upon a time, the people who listened to audiobooks fell into a niche category. Generally, the people who spent money on audiobooks were people who had money. Audiobooks weren’t cheap. Most commercial audiobook companies abridged the books, which sometimes made them hard to understand. For example, I listened to the abridged audio for the Star Wars book that I wrote, The New Rebellion…and I couldn’t understand what was happening.
Expensive and confusing—not worth the time. There were a few companies, like Blackstone Audio and Books on Tape that provided the entire book, unabridged. Books on Tape had a rental program that sent you maybe a dozen cassettes, as well as postage to return them.
It was a commitment to listen to an audiobook, and there was no fastforwarding through the dull stuff or just to get a sense of the content.
Once audiobooks became digitized, the game changed. You could skip ahead. You can listen at 2x speed and not miss a word. If you’re the kind of listener who doesn’t care about the performance, then you can speed through audiobooks quickly.
And, I’ll be honest, there are some audiobook narrators who truly grate on me. Listening to the book with that voice is annoying. I’ll listen to those narrators—who are performing to the best of their abilities—at 2x speed. I listen to most podcasts that way as well.
My old voicer self is appalled at that. Because so many books are a performance. I would never listen to an audiobook done with multiple voices that way—or at least, I haven’t yet.
I am not alone in this. People savor some audiobooks and they speed through others. Some people are so busy that they can only listen on a commute. Others have vision impairments that make reading hard, so they save the reading experience for special books.
There is no “average” consumer of audiobooks. Each consumer has different levels for different books, and there’s no predicting how each individual will respond. The process is too individual.
Years ago, as AI audio started, I made a mental note of how I wanted to use it: I decided that I’d have at least three tiers of audiobooks on some projects.
Tier 1 would be a true audiobook with a professional voice actor, someone who knew what they were doing. Or maybe even Tier 1 would have a name actor, someone who brings a big audience with them. For this, I would charge premium prices—whatever that meant at the time.
So, for the sake of argument, let’s call a Tier 1 price $25 for 15 hours of listening. It takes prep to make this book, and it would take a long time for it to earn out. But it would be the high-end version, for the people who care about those things.
Tier 2 would also be a true audiobook with a not-as-proper narrator, either someone new or maybe me. I know, I know. Some people think the author’s voice is worth more, and that might be true of an actor or Prince Harry or maybe even Stephen King, but for most people, the author is almost a negative. Not many authors have training to read their own books, so there would be a lot of cringe-worthy moments. I probably wouldn’t be my usual audio perfectionist self here, so I would charge less, maybe $15 for 15 hours.
Tier 3 would be the AI book, using one of the services and a standard voice. The book wouldn’t take long to put together and the reading would be pedestrian. So the price would be anywhere from $5-10 for the fifteen hours. Maybe less, depending on what it took to put the book together.
I suspect most audio consumers would get the Tier 3 book, not just for the price, but also the convenience. The consumer wouldn’t care about how “good” the audiobook was. That consumer would just want to consume the content, maybe at 2x speed.
As I write this, I realize there are a few other tiers I would add over time. There would be a tier between 2 and 3, that would probably be my voice, cloned. So AI audio, yes, me, yes, but not really me. You’d learn nothing about the “author’s” narration, like where I would actually put the pauses.
The other tier that I can foresee at the moment would be an audio play version of a book—either with multiple voices or revised so that the book was truly a performed version of the book, with music and sound effects.
The voicer in me really wants to do that. The writer in me is a bit appalled at the time involved.
The time involved factors into my decision making in many ways. For example, reading the nonfiction books bores the crap out of me. I’ll never do it. So I have many nonfiction books that wouldn’t get produced—or would need someone else’s voice. Those books are never going to earn as much as the fiction books, so I need a fairly cheap way to produce them.
Back when we had an audio employee, I had hoped that they would read the product, but of course, they didn’t. Sigh. And in my farming out days when I thought maybe I could bring in a practice voice from, say, UNLV, I considered using nonfiction.
But those were just noodles.
The moment I heard about AI audio, the nonfiction books were the first products I thought of.
That has been my vision for years now with audiobooks. But as I started to research this, I realized I had one more vision: The blog.
I’d love to have this weekly blog in audio form as well as online. I’ve wanted to do that for years now, since we had the audio room at WMG. I was just too sick to do it weekly, and I didn’t see any financial reason at the time to pay our staff person to record it.
At that point, there was no Patreon or anything like that. We would have had to sell the audio version of the blog on WMG’s website, or on mine, and I didn’t see any financial gain there.
Now, though? I definitely see it. The audioblog would be a Patreon level, along with maybe a subscription.
So…as I started doing my research into the AI audio, I did what you all should do. I read the terms of service first.
I was looking for a handful of things.
First, I had to make sure that I controlled the content. I didn’t want some AI site to gain ownership of any kind in my copyright because I had decided to make an audiobook of something.
Second, I needed a license for commercial use, since I would be selling the audio version of whatever I did.
Third, I didn’t want exclusivity. I’ve been exclusive with Audible for years now and I loathe it. As I always warn you all when you make exclusive deals, imagine what would happen when the lovely service you’re using turns on you.
That’s what Audible did to me. I got thousands and thousands of dollars up front ten to twelve years ago, along with bonus payments, because Audible was trying to establish itself.
The editor I worked with was tremendous. He was a reader. He liked the people he was working with, and he loved the books. He wanted more of everything. He responded quickly. The contracts were good.
Then things started to go south. Audible became a wholly owned subsidiary of Amazon. Which was not a problem at first, and became one about five or six years in. Because Amazon started flexing its muscles.
Still, Audible was pretty much the only place to go for audio at that point. Other places were too small and/or they had to distribute through Audible anyway. So it was a shrug.
But all the bonuses? They went away. And Amazon started playing screwy games with payment, so the payment amounts went way down. I started chaffing at the exclusivity.
Then Findaway Voices came in. They were too expensive for people like me to have something produced. (Again, I did the math; again, it didn’t work out.)
By the time I wanted to leave, we no longer had an audio room and I was too sick to do much. So I figured audio could wait.
Unfortunately, the wonderful editor left, replaced by a guy who was such a screw-up that he spent most of his time planning his social life (on the phone…asking me and the other authors for advice (!)) and not editing, not getting things done, not sending out contracts.
By the time he was fired, my Audible products had stalled and there was nothing new to goose sales. The person who replaced him wouldn’t even return my calls—and I have dozens of items paid for by Audible on Audible.
That’s how exclusivity bites you in the ass.
I’m not taking the books down yet, but it’s a bucket list for my future.
So…any service that requires exclusivity is no longer any place that I want to go.
Still, I looked at various programs and, as I said, started reading terms of service. Places like Apple want exclusivity. You’re creating the book through Apple, so they want to make sure the book is only available there.
That’s a hard no for me.
Google, on the other hand, allows you to export the content and sell it somewhere else, providing a not-half-bad commercial license…except…it’s for a book and you have to guarantee that the audiobook will be available on GooglePlus.
If I was doing books only, maybe. Maybe not, because as with most terms of service, the terms can change on the company’s whim.
So another hard no.
What I need, and what I’m digging into, are places that produce books and then release them. I control where the content is sold. (Commercial license.) In some, I have to make sure I credit publically and clearly where the AI audio was produced. In others, I don’t have to.
One site has a “no profanity” clause. Which would require me to have a clean version of whatever product I’m doing. That’s not a problem for, say, the blog, because I can clean up my language here.
But it is a problem if I’m working on a Smokey Dalton novel, which has some extremely period-appropriate horrifying words in the books, or if I’m writing a rather spicy romance.
Makes me wonder, in fact, if the site could even be used for the spicier kinds of romance.
Last week, I posted what I was going to do, and a lot of people weighed in, some on Patreon, some on the website, and some via email. No one understood what I was trying to do, maybe because I hadn’t explained it. (Well, duh.)
A few people complained about the robotic voices. WMG had been invited to beta test a new product a month or so ago, and our non-audio person in charge of this thought both voices were fine. The female voice was human enough, but the male voice did have a robotic edge.
I don’t mind—if I’m going to look at the product on a Tier 3 level. The key is to get the sound file out into the world.
With all of that in mind, I now have reached the test stage. I have some other needs that you folks don’t have.
If I end up with an AI service of the kind I’m describing here, I need a higher level plan, one that would allow several people to access the account. That automatically puts us into an expensive payment structure—either the “Pro” level or the “Executive” level…which on most sites means I have to contact them directly to find out the price. And, as they say, if you have to ask, you can’t afford it.
What I’ll be doing is testing the sites as an individual first, and then moving to a higher payment level if I like what I’m hearing.
I’m also doing a lot of reading on the various rating sites. A few of these companies have a major complaint—which is that it’s almost impossible to cancel the service.
I can’t tell if that’s from someone who is spending at the lowest level or one of the Executive plans.
That requires more research. But, given what I want to do, it is a potential red flag.
That’s as far as I’ve gotten in this very busy week, filled with the Rock ’N’ Roll races here in Vegas, lots of visitors, too much homework for my one class, and a fiction project due, as well as the last days of a Kickstarter, which has me doing more promotion than usual.
Oh! And my latest Fey novel, The Kirilli Matter, appeared last Tuesday, so I did some promotion on that as well.
I hope to have more audio news for you soon. I’m finding this fun, which is a good sign. It means I’ll be working on it much harder than I probably should, given everything else I’m doing.
I promised you all that I would add some sales material down here, with things that are relevant to the current post. So…
If you want to understand how to read terms of service with the proper jaundiced eye, you can take our Bite-Size Copyright course or you can pick up my book Closing The Deal On Your Terms.
And here’s one other reminder: This weekly blog is reader supported.
If you feel like supporting the blog on an on-going basis, then please head to my Patreon page.
If you liked this post, and want to show your one-time appreciation, the place to do that is PayPal. If you go that route, please include your email address in the notes section, so I can say thank you.
Which I am going to say right now. Thank you!
Click paypal.me/kristinekathrynruschr4e to go to PayPal.
“Business Musings: The Research Part 1 (AI Audio),” copyright © 2023 by Kristine Kathryn Rusch. Image at the top of the blog copyright © Can Stock Photo / aroas.
I use text to speech frequently but, from experience, only very few of the voices I’ve tried are in any way natural sounding.
I don’t understand how to use Google services other than through their translation site, but those voices are okay. A bit wooden but not very robotic. Amazon Polly had a number of voices to choose from and I tried both the low quality ones and the high quality ones. I only liked one male voice (Matthew) and had it ‘read’ 9 books. I’ve only listened to two of those 9 books yet. It’s not a very nice experience.
I tried Speechelo, too. At a limit of 500.000 characters a month, it’s a no go. And the voice that sounded good in the demo (you can only listen to one sentence at a time unless you buy an account) didn’t sound good in long form.
I use a program that can access several sites’ tts and up until a year ago, Youdao had a female, English narrator that was very lifelike. Then that narrator turned into two different voices that swapped places at unexpected times. Then, a third voice that read like molasses was introduced and the other two disappeared. So, there went that service.
Okay, what to do? Well, I had to turn back to IBM Watson and the voice Lisa (High Quality). I had used that voice before and found it had a natural sound and was pleasant. It’s the one kind of okay solution I have now.
Maybe it’s not properly after copyright laws to do this on my own.
I’m not going to be supporting anyone through Patreon, though. Unless it’s payment for what I get, there’s no way I’m going to spend money. I thought Patreon was like ‘pay this amount to that person and you have access to this and that’ but it’s not. You have to keep the payments coming. I say, I’ll pay for the products I want, not as support during production and then additional money when a product is finished. That’s why I support using Kickstarter. There, I know what I’ll pay and what I get upfront.
Maybe this new AI company has some amazing qualities that these other examples I’ve mentioned don’t have. I know one can make pronunciation and emphasis change manually but doing that to a novel long text is an awful lot of work. Training an AI to approximate is probably the best idea.
So what is GooglePlus? I was under the impression that an audiobook created by Google AI had to be available in the Google Play Store but could also be distributed to other retailers. Am I missing something?
Yes, it has to be available on the Google store, which I don’t like. I know it can go on other retailers. But it has to be a book. My blog is not a book. Also, as I said, terms of service can change at any moment. If you go through Google, Google controls the product.
D2D contacted me awhile back about Apple getting into AI Audio and thought some of my stuff would be a good fit. I got the impression after a time period it would go to other places.
I submitted mine but Apple has to approve them. I’m hoping that if they promote, it will help Ebook sales.
I would love to use AI to narrate my books, and I’d listen to it, too. My only problem is that my fantasy books have a lot of made-up words. I’ve listened to text to speech butcher my made-up words for years, and I worry that AI would do the same. That’s my biggest roadblock, honestly.
You can correct it now, and it will learn–on some programs
I love audiobooks. I’ve listened to some authors who can read their own stuff and some who can’t. I was happily surprised by Robert Crais reading his own novel The First Rule, which I just finished. Good book. Good narration.
I used to listen to audiobooks during my commutes to and from my day job. Since I don’t have the commutes anymore, I listen to audiobooks while I’m doing chores around the house. The trick is to stop listening to the book once the chores are done. *g*