Need to convert audio files to text? Amazon Transcribe handles the heavy lifting. This guide shows you how to integrate it with Ruby.
Prerequisites
You need an AWS account and IAM credentials with Transcribe permissions. Your audio files must be in S3.
Install the required gem:
gem install aws-sdk-transcribeserviceOr add it to your Gemfile:
gem 'aws-sdk-transcribeservice'Configure AWS Credentials
Set your credentials via environment variables:
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION=us-east-1Or create ~/.aws/credentials:
[default]
aws_access_key_id = your_access_key
aws_secret_access_key = your_secret_key
region = us-east-1Basic Transcription
Here is a minimal working example:
require 'aws-sdk-transcribeservice'
require 'net/http'
require 'json'
client = Aws::TranscribeService::Client.new(region: 'us-east-1')
job_name = "transcription-#{Time.now.to_i}"
audio_uri = 's3://your-bucket/audio.mp3'
client.start_transcription_job(
transcription_job_name: job_name,
language_code: 'en-US',
media_format: 'mp3',
media: { media_file_uri: audio_uri }
)
puts "Job started: #{job_name}"The job runs asynchronously. You need to poll for completion.
Wait for Results
Poll the job status until it completes:
def wait_for_transcription(client, job_name, timeout: 600)
start_time = Time.now
loop do
response = client.get_transcription_job(transcription_job_name: job_name)
status = response.transcription_job.transcription_job_status
case status
when 'COMPLETED'
return response.transcription_job
when 'FAILED'
raise "Transcription failed: #{response.transcription_job.failure_reason}"
end
if Time.now - start_time > timeout
raise "Timeout waiting for transcription"
end
sleep 5
end
end
job = wait_for_transcription(client, job_name)Fetch the Transcript
The transcript is stored at a temporary URL. Download and parse it:
def fetch_transcript(job)
uri = URI(job.transcript.transcript_file_uri)
response = Net::HTTP.get(uri)
data = JSON.parse(response)
data['results']['transcripts'].first['transcript']
end
transcript = fetch_transcript(job)
puts transcriptComplete Example
Here is a reusable transcription class:
require 'aws-sdk-transcribeservice'
require 'net/http'
require 'json'
require 'securerandom'
class AudioTranscriber
def initialize(region: 'us-east-1')
@client = Aws::TranscribeService::Client.new(region: region)
end
def transcribe(s3_uri, language: 'en-US')
job_name = "job-#{SecureRandom.hex(8)}"
format = detect_format(s3_uri)
@client.start_transcription_job(
transcription_job_name: job_name,
language_code: language,
media_format: format,
media: { media_file_uri: s3_uri }
)
job = wait_for_completion(job_name)
fetch_transcript(job)
end
private
def detect_format(uri)
ext = File.extname(uri).delete('.').downcase
%w[mp3 mp4 wav flac ogg amr webm].include?(ext) ? ext : 'mp3'
end
def wait_for_completion(job_name, timeout: 600)
deadline = Time.now + timeout
loop do
resp = @client.get_transcription_job(transcription_job_name: job_name)
job = resp.transcription_job
return job if job.transcription_job_status == 'COMPLETED'
raise "Failed: #{job.failure_reason}" if job.transcription_job_status == 'FAILED'
raise "Timeout" if Time.now > deadline
sleep 5
end
end
def fetch_transcript(job)
uri = URI(job.transcript.transcript_file_uri)
data = JSON.parse(Net::HTTP.get(uri))
data['results']['transcripts'].first['transcript']
end
end
# Usage
transcriber = AudioTranscriber.new
text = transcriber.transcribe('s3://my-bucket/meeting.mp3')
puts textError Handling
Wrap your transcription calls to handle common failures:
def safe_transcribe(s3_uri)
transcriber = AudioTranscriber.new
transcriber.transcribe(s3_uri)
rescue Aws::TranscribeService::Errors::BadRequestException => e
puts "Invalid request: #{e.message}"
nil
rescue Aws::TranscribeService::Errors::LimitExceededException => e
puts "Rate limited. Retry later."
nil
rescue StandardError => e
puts "Transcription error: #{e.message}"
nil
endSupported Languages
Amazon Transcribe supports many languages. Pass the correct code:
# Common language codes
'en-US' # English (US)
'en-GB' # English (UK)
'es-ES' # Spanish
'fr-FR' # French
'de-DE' # German
'pt-BR' # Portuguese (Brazil)
'ja-JP' # Japanese
'zh-CN' # Chinese (Mandarin)Cost Optimization Tips
Transcription costs add up. Here are ways to reduce them:
Batch jobs during off-peak hours. Pricing stays the same but you avoid hitting limits.
Use the correct audio format. Mono audio at 16kHz is optimal for speech. Downsampling stereo saves storage and processing time.
Set output bucket. Store results in your own S3 bucket to avoid repeated fetches:
client.start_transcription_job(
transcription_job_name: job_name,
language_code: 'en-US',
media_format: 'mp3',
media: { media_file_uri: audio_uri },
output_bucket_name: 'my-transcripts-bucket'
)- Delete old jobs. Clean up completed jobs to stay organized:
client.delete_transcription_job(transcription_job_name: job_name)Next Steps
You now have working Ruby code for audio transcription. Consider adding:
- Speaker diarization to identify who said what
- Custom vocabularies for domain-specific terms
- Redaction for sensitive content like PII
- Real-time streaming for live audio
Check the AWS Transcribe documentation for advanced features.