For long-form interview creators
Turn one long interview
into transcript files you can use.
Upload an MP3 interview, preview the first 15 minutes free, then continue to the full transcript when the output is worth keeping. Finished jobs include transcript text plus TXT, SRT, and VTT exports.
15-minute free preview
See real transcript text first
Speaker-labeled
Readable host and guest turns
TXT, SRT, VTT
Files for reading and publishing
CastCraftly Intake
Upload File
Upload an MP3 interview from your device to preview
No sign-in required for one free preview today.
Drag an MP3 here to upload
Format: mp3
Up to 300 MB, max 150 min
By using CastCraftly, you agree to our Terms of Service and Privacy Policy.
Long interviews create a different cleanup problem.
After recording, the slow part is rarely getting audio into a file. It is finding the quote again, checking who said what, and turning one long conversation into text your publishing workflow can actually use.
How it works
Upload one MP3 interview
Start with the same long-form source file you actually need to publish.
Preview the first 15 minutes
Read real transcript output before deciding whether the full file is worth processing.
Get the full transcript
Continue into the complete job and export TXT, SRT, and VTT when it finishes.
Built for the way long interviews get used.
The transcript is not the final product. It is the working layer that helps you publish, search, caption, and reuse the conversation.
For show notes
Search the transcript instead of scrubbing a 90-minute recording again.
For captions
Export SRT and VTT files for publishing workflows that need timed text.
For clip pulls
Use transcript text and timestamps to find the lines worth cutting first.
For quote search
Find the exact answer, story, or phrase without replaying the whole episode.
For archives
Turn old long interviews into text you can actually search and reuse.
For guest workflows
Share readable transcript output after the recording instead of a raw audio file.
What you can count on
Preview before purchase
Read the first 15 minutes before paying for the full transcript.
Failed jobs do not consume minutes
If processing fails, the job does not burn the minutes you paid for.
Clear plan choices
Buy one 120-minute credit pack or choose a monthly creator plan.
Built around long interviews
The product is shaped around creator interviews, not meeting-note workflows.
Export-ready formats
Completed jobs return TXT, SRT, and VTT from the same transcript source.
No hidden public-beta promises
The free preview is MP3-only today, and the page says so before upload.
What you get back
Every completed job returns three synced export formats from the same transcript source. Here's exactly what the files look like.
TXT
For reading and show notes
[00:00] HOST What's the biggest mistake operators make in their first 12 months? [00:14] GUEST Hiring before they have a clear sales motion. Every single time. I've seen that kill more startups than bad markets ever did.
SRT
For CapCut, Premiere, and clips
1 00:00:00,160 --> 00:00:04,160 What's the biggest mistake operators make in their first 12 months? 2 00:00:14,200 --> 00:00:19,400 Hiring before they have a clear sales motion. Every single time.
VTT
For YouTube and web players
WEBVTT 00:00:00.160 --> 00:00:04.160 What's the biggest mistake operators make in their first 12 months? 00:00:14.200 --> 00:00:19.400 Hiring before they have a clear sales motion. Every single time.
Snippets are illustrative. Real exports preserve every word, timestamp, and speaker boundary from your audio.
Pricing
Start with one interview or use a monthly plan when long-form episodes are part of your regular workflow. Failed jobs do not consume minutes.
FAQ
What audio and video formats does CastCraftly support?
MP3, M4A, WAV, and WebM audio, plus MP4, MOV, and WebM video. Files up to 300 MB and 150 minutes per upload. The public free preview is MP3-only today; the full job flow accepts every supported format.
How long can a single interview be?
Up to 150 minutes per file, which covers the vast majority of long-form podcast interviews. If your recording is longer, split it before uploading.
How accurate is the speaker labeling?
Speaker diarization works reliably for clear two- or three-person interviews recorded on separate microphones. Fast cross-talk and heavily overlapping speech are the hardest cases — those occasionally get merged into one speaker. You can rename Speaker 0 / Speaker 1 to actual names (Host / Guest / real names) on the finished job page, and the new labels apply to your on-page transcript and TXT download.
What's the difference between TXT, SRT, and VTT?
TXT is plain transcript text grouped by speaker with timestamps — best for reading, quoting, show notes, or feeding into another tool. SRT is timed subtitle output that video editors like Premiere, CapCut, and DaVinci Resolve import natively. VTT is the WebVTT format that YouTube, HTML5 video, and most web players expect. Every completed job returns all three from the same transcript source.
What happens if a transcription fails?
Failed jobs do not consume minutes. Reserved minutes are refunded back to your account automatically, and the job page shows the error so you know whether to retry or contact support. You never pay for work the system did not finish.
Do I have to subscribe?
No. One interview adds 120 minutes for a one-time purchase, and those minutes do not expire. The monthly Creator plan is there for when you publish a long interview every week and want predictable cost — it is never required.
Is my audio kept private? Is it used to train AI models?
Your uploads stay private to your account. We do not share files with third parties beyond the speech recognition API needed to produce the transcript, and your content is not used to train any model. You can request deletion of a job and its source audio from the dashboard.