For audio generation I recommend Bark. I am getting 14 seconds of audio that is about a third of eleven labs quality in about 2 minutes.
This is happening on a Windows 10 Dell, with 32gb of RAM, an i5, and an Nvidia 1050 GeForce with 4gb of vram.
I'm also able to decently run local LLMs because of llama.cpp and other libraries that can share models been ram and vram. There are other tools that can help with this as well including Ollama.
I suggest subscribing to r/localLLAMA. I also suggest using Bing Copilot in Edge with allowed access to the page you're viewing. I often use it to find new GitHub libraries and to give me first steps to be able to start using a new framework.
There is so much out there for LLM's parsing is a pain.