Business Musings: The Research (Part Two) AI Audio

Business Musings Current News free nonfiction On Writing

Full disclaimer here: I wrote longer pieces on my Patreon page and combined them together for this post. There’s a lot of learning and dithering there, as well as a fairly mediocre audio sample that you can hear if you want to join up.

I wrote those pieces in the moment, so I’m reprinting them here without changing that “in the moment” feel. I have edited for content, though, because some is no longer relevant, and some only belongs on Patreon.

So…here goes:

Sunday, March 5, 2023

 

I just spent a half fun few hours and a half pain in the patootie few hours. As I mentioned in the previous post, I’ve been working on AI audio. I decided I’d make a decision on the preliminary service this week.

I figured I’d do a lot of audio versions of the test blog, each from a different site. But the terms of service on some sites scared me off. On others, it was the pricing. Not the introductory pricing, but the pricing that WMG needed.

The Enterprise Tier of many of those services, which is the tier WMG would need, are often eye-crossingly expensive. Many of them include services that we don’t need…at least at the moment.

A number of the services sounded great, until I looked at how many hours of audio I would get for the price. A few of the services, in beta, were really expensive. I’d rather pay a voice actor than pay for these services.

So I ended up trying only one service, Murf. It has a good TOS (at the moment, anyway). It gave me ten free completed minutes of audio. I only used 1:17 minutes.

The free service did not let me clone my voice (not that I would have at this juncture), although I could have tried a simulation. Instead, I had the choice of two middle-aged female voices or half a dozen female young adult voices. I could also have at least two middle-aged male voices, and a bunch of middle aged young adult voices.

I chose the least objectionable middle-aged female voice, and played.

I had to work with pronunciation on some expected things, like my last name, and some unexpected things, like PayPal. The voice, at a neutral speed, sounded robotic, so I sped her up.

As I noted in the text, I had to change a number of things for clarity. I will have to do some of the audio blogs differently than I do the text blogs, which really isn’t a problem.

All in all, it took me 30 minutes to learn the system and create the 1:17 minutes of audio. I could have done the same on one of my audio programs, using my own voice, in half that time.

But I don’t expect the audio version of the blog to take longer than 30 minutes to set up. Most of that 30 minutes was me learning the program. Not a big deal, actually, and it wasn’t that hard.

I was surprised, actually. I thought it would be more difficult. Instead, I had fun.

Tuesday, March 7, 2023

In my AI Audio research, I found a lot of really good programs. Almost all of them wanted me to email them or contact them by phone to do voice cloning. Which means that voice cloning is expensive.

At the moment, I’m not into expensive. I’m going to pay a little for some of these services because I want to do the blog and a few other things, but I am not going to pay a lot.

I’m going to wait on voice cloning.

I liked what I saw from Murf.ai, and I had fun playing with their system. It didn’t take long, as I mentioned above, and the sound was good enough. (I didn’t spend extra time tweaking it, since I wasn’t sure if I was going to use the program.)

I liked the Pro Plan, but I liked the Enterprise Plan better. According to the pricing page, there was only $33 between them, so why not go with the Enterprise Plan?

Well…I didn’t click through enough.

Let’s backtrack a bit. When I initially researched the AI Audio services, I read a lot of reviews. Some services clearly curated their reviews, which means (for the handful of you who aren’t aware of this), they interact with people who had a negative experience, and then ask that the negative review get removed.

The really expensive services are more sophisticated about that. They leave the complaint, and their reply, and then get the reviewer to put up another review—a we’re all good now—review.

A few companies actually scrub the negative reviews. I’m not sure how they do that (or how much it costs) but it has always irritated me. I don’t like the way they pretend that nothing is ever wrong. Things do go wrong, but usually those things can be fixed.

Which brings us to Murf.ai. Murf does not curate their reviews on any of the sites. There is a lot of satisfaction with their audio services, enough to hearten me.

All of the complaints I found had to do with billing. Much of what I found was the kind of review I’d seen on sites for gyms that have memberships.

It’s impossible to cancel the service and get a refund

Or…

I ended up paying much more than I wanted to.

I made a mental note of that. Be careful; watch pricing. And then I forgot about it.

After doing some of the research, I decided to sign up for the Enterprise Plan. It says “$59 per user per month.” I spoke to Allyson Longueira at WMG, figuring we’d sign me up through the company since, after my exploration was done, the company would handle everything. We’d add users as needed.

Well…no.

The pricing is $59 per user per month, but with the Enterprise Plan you need to have a minimum of five users. You can’t just have one user at $59.

Then there’s this per month crap. You can’t pay monthly. You have to pay for the entire year.

Up front.

Which I was not willing to do. No way were we going to spend $3540 to experiment with a service of any kind. Especially since I wasn’t sure we’d like it.

These pricing shenanigans are all over the web. They are annoying as hell. And, according to those bad reviews, some people actually paid without looking—and were then startled that when they expected to be billed $26, say, for the Pro Plan, and instead were billed $312, they immediately wanted their money back.

They tried to get it back through Murf. I’d’ve just gone to my credit card company, but hey, not everyone knows to do that.

Anyway…

Those pricing fun and games irritated me. They always do when I run across that thing.

So I set Murf aside and investigated other companies again. Most AI Audio services want you to call them for quotes on their Enterprise Plans…and oh, yeah, you need ten or more people on the plan.

Nope. At most, we’d have five. At most. Probably two.

Some services required the customer to be a corporation. That’s not a problem for us, but that meant corporate pricing. Which is always Ye-ou-ch.

On my Patreon post, some folks mentioned they’re already using their built-in text to speech functions (in the accessibility settings) for ai assisted reading. Some people already use a text-to-speech function so they can hear the daily paper or read a website. So, these folks warned me, there might be no reason to do the blog.

Ah, maybe it’s just vanity. Maybe I’m playing. I don’t know.

But their points made me rethink a few things…and search for a text-to-speech function that I could use in the programs I already had.

I have a subscription to Adobe Audition. It has a text-to-speech function, but it’s not really designed for multiple voices. However, it does have a really nice AI Audio feature that will clean up a bad sound environment, which intrigued the heck out of me. Maybe I can record at home after all.

That would make my life easier.

But…not the blog. Because life is too short, and I get really anal about recording. And editing. And working on sound.

So if I recorded the blog myself through Audition, I’d lose two to four hours per week. Doing it with an AI voice? Maybe thirty minutes. Once the controls are set up, it might even be less.

Audition was a sidebar.

There are other sidebars, most of which I’m not going to discuss here, since they have to do with some internal company decisions.  Let’s just say that the investigations continued.

I ended up cycling back to Google Play Books.

There’s a lot I don’t like in the Google AI Audio program. First, everything that is converted into audio has to be a book. Second, the audiobook is required to be on Google Play. Third, Google can change its terms of service tomorrow if it wants to, and I have no recourse. (Neither will you.)

There are, however, a few things to like.

Google Play’s interface is currently free. Or as it says on their information page:

For a limited time, there’s no charge to create, publish, and download these audiobooks.

Uh-oh. That means that at some point in the future, there will be a charge.

But right now, the service is free. Free is a good price.

The second thing to like? Right now, anyone who creates an audiobook using Google’s tech can download that audiobook (for free) and sell it on other retail platforms…as long as it’s on Google’s platform as well.

Okay, then. It costs nothing to produce an audiobook. Nothing except time, and even that will be limited.

As I dug into this, and got to thinking about it, I put my negotiator’s brain on it.

Yes, the program is free now, but it won’t always be. However, Google tests its programs for years sometimes, so the free might last a while.

But what happens if Google changes its TOS so that the books can no longer be downloaded and sold elsewhere? Suddenly those books would have to be exclusive to Google. Maybe the TOS would try to make that exclusivity retroactive. (I haven’t read closely enough to know that if the service changes, the TOS might state that those changes would have an impact on previous recordings. )

I don’t like any of that. I mentioned it to one person who said, “Well they can’t do that. What would they do? Take away all the downloads?”

Oh, probably not. But maybe. Or maybe they’d ask for a retroactive fee. Even if the (imagined) change in the TOS was legally indefensible, a certain percentage of people might pay anyway. If they did, Google would make a boatload of money.

There are a lot of what-ifs like that, and most of them are not good.

Except that there’s a recourse to most of the changes I imagined.

The recourse would be this: We could leave our audiobooks produced by Google up on Google Play and then redo the books for a different platform.

Before I made any decisions about anything with the audio, I needed to talk with Allyson at WMG. Because my imagined problems and my solutions would require a duplication of work. In other words, the books produced in 2023 might have to be redone in 2025.

Sure, the tech might be better in 2025. We might want to redo the books anyway. But we have 1,000 books. That’s a lot of time, even if each book only takes fifteen minutes.

I’m not worried about that, and neither was Allyson. Who knows what will change in the next few years. It’s a risk worth taking.

Thursday March 9

Just found this on Google’s TOS:

Under 2.1 Program:…Google will notify Publisher of any changes to the fees and such changes will not be applied retroactively.

My greater concern is 3.1(b), which allows them to use and exploit the audio content however they want. There are other protections in the TOS, but I’m not sure they’ll be enough to prevent…what? I’m not sure. It’s tough to find an easy link to the TOS, so you need to do it on your own. Be sure to read the other links in the TOS to understand the full document.

 

So here’s what we decided.

WMG would do the low-level audiobooks through Google’s system. (Tier 3 for those of you who read the first research blog)

I would play with all kinds of experimental audio on both Murf (for the blog) and on Adobe Audition. WMG would use Google Play to create audiobooks at Tier 3 (the good-enough tier) while we figure out how we’re going to add Tiers 2 and 1.

Here’s the problem: there’s only so much time. We have employees, but they work 40 hours per week. I work more than that on so many different projects. Dean works harder than all of us combined.

We don’t have a lot of time.

Right now, AI audio costs little in time. It will enable consumers to get an audiobook of something they don’t have time to read (or have trouble reading, due to health issues). It’s a win on that level, and is worth the effort, minimal as it is.

I’ll be putting my blogs up as audio files. They’ll go up early on Patreon, and later on this website.  I need to figure out exactly how I want to do all of this, so I won’t start this week.

I’ll figure it out though, and probably blog about it. Or not. I have a few other things to work on. The world is changing so fast it’s hard to keep up. Which techno me actually finds fun.

Okay, then. I guess I’ve figured out the next few weeks.

I’ll keep you informed about what I’m doing, and how it’s all going. I’m sure there’s another post or two in the future.

We’ll just have to see.

****

This weekly blog is reader supported.

If you feel like supporting the blog on an on-going basis, then please head to my Patreon page.

If you liked this post, and want to show your one-time appreciation, the place to do that is PayPal. If you go that route, please include your email address in the notes section, so I can say thank you.

Which I am going to say right now. Thank you!

Click paypal.me/kristinekathrynruschr4e to go to PayPal.

“Business Musings: The Research (Part Two) AI Audio,” copyright © 2023 by Kristine Kathryn Rusch. Image at the top of the blog copyright © Can Stock Photo / solarseven.

8 thoughts on “Business Musings: The Research (Part Two) AI Audio

  1. Hi Kris, I’ve been looking at the D2D Apple Audio TOS, and two things stand out to me:

    For each Chosen Work, in addition to the rights otherwise granted under the D2D Terms, you grant Draft2Digital the worldwide, nonexclusive, and sublicenseable right and license to:
    Use, copy, modify, reproduce, distribute, translate, transmit, display and make derivative works of those Works and Related Materials as Audiobooks, as well as the right and license to retain a copy of each such Audiobook and corresponding Related Materials and distribute, sell and offer for sale such Audiobooks, in whole or in part, by all means now known or later developed, and in all languages and formats; and
    Market the Works as Audiobooks,…”

    So they want derivative rights. All derivative rights? I.e. a rights grab? This would allow D2D to license out, for instance, video game rights based on the audiobook content. There are so many derivative rights, it seems like I’d be giving away the farm for the pleasure of having my book AI-narrated for free.
    They are also grabbing translation rights, so this would be a worldwide license.

    Also:
    ” You shall have no right, title, or interest in the Audiobooks nor any of the technology or intellectual property rights used to create Audiobooks from Your Works.”

    This is not the same as producing an audio on ACX and then having rights to that audio 7 years later. Even though the license is non-exclusive, it smells a whole lot like a sale. Unless, of course, I take the audio down after the period of “more than 6 months on the market” expires. If I am unfortunate, D2D might have sublicensed some of my rights out already and I don’t see a reversal mechanism in that very short contract.

    Am I reading this right or am I just being paranoid? I did write to D2D, pointed out my concerns, and requested changes to the TOS. Meanwhile I uploaded a gay romance trilogy that would be hard to exploit for gaming content, just because I was terribly curious how it comes out. (A gambit, I know. Perhaps an unwise one.)

  2. Hi Kris, I’ve been looking at the D2D Apple Audio TOS, and two things stand out to me:

    For each Chosen Work, in addition to the rights otherwise granted under the D2D Terms, you grant Draft2Digital the worldwide, nonexclusive, and sublicenseable right and license to:
    Use, copy, modify, reproduce, distribute, translate, transmit, display and make derivative works of those Works and Related Materials as Audiobooks, as well as the right and license to retain a copy of each such Audiobook and corresponding Related Materials and distribute, sell and offer for sale such Audiobooks, in whole or in part, by all means now known or later developed, and in all languages and formats; and
    Market the Works as Audiobooks,…”

    So they want derivative rights. All derivative rights? I.e. a rights grab? This would allow D2D to license out, for instance, video game rights based on the audiobook content. There are so many derivative rights, it seems like I’d be giving away the farm for the pleasure of having my book AI-narrated for free.
    They are also grabbing translation rights, so this would be a worldwide license.

    Also:
    ” You shall have no right, title, or interest in the Audiobooks nor any of the technology or intellectual property rights used to create Audiobooks from Your Works.”

    This is not the same as producing an audio on ACX and then having rights to that audio 7 years later. Even though the license is non-exclusive, it smells a whole lot like a sale. Unless, of course, I take the audio down after the period of “more than 6 months on the market” expires. If I am unfortunate, D2D might have sublicensed some of my rights out already and I don’t see a reversal mechanism in that very short contract.

    Am I reading this right or am I just being paranoid?

    1. It is impossible to know the answer to your question, Kate, without reading the entire TOS. As I say all the time a contract is one long agreement. You can’t cherry pick parts and get an answer. I do think the key phrase here though is “as Audiobooks”

  3. If you haven’t already, you might want to check out Blakify. I got it for a one time payment on Appsumo and the voices are decent. Full commercial rights are included with the purchase. Some of the voices are… not great, but there are a handful that I like. I’m experimenting with AI audio on Youtube so this is a nice resource for me.

  4. I’m playing with Google AI now. It’s got the option to slow narration and edit pronunciation but it isn’t working for everything. On things like tear/tear it will give you the two common options and you can apply a pronunciation to a single word or the entire manuscript. But for my made up words in my fantasy story it’s a bit harder. The options are written out phonetically, and sometimes they don’t work. It says you can speak/record a word, but I haven’t gotten that to work yet either. I need to figure out where to get those phonetic characters so I can build the right sound.

    And it really doesn’t understand hmm. And ellipses. Even when I slow it down as much as it allows, it still sounds too abrupt. I foresee some rewriting for some of my more cryptic characters.

    But it’s free. So it seems like a reasonable experiment.

  5. I’d be less concerned about Google changing to charge for audio, than Google shutting down their AI program or replacing it with something differently inconvenient.

    Sure, you can roll with that change and it won’t destroy existing recordings. But they can’t even keep their *chat* service consistent year-to-year.

    But that’s the story of our whole business, isn’t it? Adapt, flow, and persist.

Leave a Reply

Your email address will not be published. Required fields are marked *