I used open-interpreter to read an epub file and create a DIY audio book.
Open-interpreter suggested that I use the bs4
and ebooklib
libraries.
It recommended an API to create audio files from text, but I was easily able to switch this out for the free and local alternative, say
on macOS.
As I worked (let the model write code), it was easier to copy the code to a separate file and make modifications.
However, the initial prototype built by open-interpreter accomplish the majority of the work.
I was able to go from an epub file to 48 audio tracks on my phone in 15 minutes or so.
Open-interpreter was a joy to collaborate with.
My main wish for it at this point is for it to write the code it generates to a notebook that I can collaborate it in.
This would allow me to help open-interpreter resolve issues it gets stuck on, and maintain a copy of the source that I can revisit in future sessions, or eventually turn the code into a more fully formed program.
Here is the code, largely copied out from open-interpreter with a few changes by me.
I wrote the parallelization of the audio file generation with Cursor’s OpenAI-based code generation and manually wrote text_to_speech
using say
.
import concurrent.futures
import ebooklib
import os
from bs4 import BeautifulSoup
from ebooklib import epub
def read_epub(file):
book = epub.read_epub(file)
content = []
for item in book.get_items():
if item.get_type() == ebooklib.ITEM_DOCUMENT:
content.append(item.get_content())
return content
epub_content = read_epub('my_book.epub')
print('Number of items in the EPUB file:', len(epub_content))
def extract_text(html_content):
soup = BeautifulSoup(html_content, 'html.parser')
return soup.get_text()
sample_text = extract_text(epub_content[0])
print('Sample text:', sample_text[:500])
def split_text(text, length=15000):
return [text[i:i+length] for i in range(0, len(text), length)]
# Extract all text and split it into chunks
all_text = ''.join([extract_text(content) for content in epub_content])
text_chunks = split_text(all_text)
print('Number of text chunks:', len(text_chunks))
def text_to_speech(text, file):
os.system(f'say -v Tessa -r 240 -o {file} "{text}"')
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
futures = []
for i, chunk in enumerate(text_chunks):
if not os.path.exists(f'audio_chunks/chunk_{i}.aiff'):
future = executor.submit(text_to_speech, chunk, f'audio_chunks/chunk_{i}.aiff')
futures.append(future)
else:
success = True
for future in concurrent.futures.as_completed(futures):
success = future.result()
if not success:
print(f'Failed to convert chunk {i} to speech after multiple retries.')
break
print('All text chunks have been converted to speech.')