Ruby Audio Transcription with Amazon Transcribe

Need to convert audio files to text? Amazon Transcribe handles the heavy lifting. This guide shows you how to integrate it with Ruby.

Prerequisites

You need an AWS account and IAM credentials with Transcribe permissions. Your audio files must be in S3.

Install the required gem:

gem install aws-sdk-transcribeservice

Or add it to your Gemfile:

gem 'aws-sdk-transcribeservice'

Configure AWS Credentials

Set your credentials via environment variables:

export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION=us-east-1

Or create ~/.aws/credentials:

[default]
aws_access_key_id = your_access_key
aws_secret_access_key = your_secret_key
region = us-east-1

Basic Transcription

Here is a minimal working example:

require 'aws-sdk-transcribeservice'
require 'net/http'
require 'json'

client = Aws::TranscribeService::Client.new(region: 'us-east-1')

job_name = "transcription-#{Time.now.to_i}"
audio_uri = 's3://your-bucket/audio.mp3'

client.start_transcription_job(
  transcription_job_name: job_name,
  language_code: 'en-US',
  media_format: 'mp3',
  media: { media_file_uri: audio_uri }
)

puts "Job started: #{job_name}"

The job runs asynchronously. You need to poll for completion.

Wait for Results

Poll the job status until it completes:

def wait_for_transcription(client, job_name, timeout: 600)
  start_time = Time.now

  loop do
    response = client.get_transcription_job(transcription_job_name: job_name)
    status = response.transcription_job.transcription_job_status

    case status
    when 'COMPLETED'
      return response.transcription_job
    when 'FAILED'
      raise "Transcription failed: #{response.transcription_job.failure_reason}"
    end

    if Time.now - start_time > timeout
      raise "Timeout waiting for transcription"
    end

    sleep 5
  end
end

job = wait_for_transcription(client, job_name)

Fetch the Transcript

The transcript is stored at a temporary URL. Download and parse it:

def fetch_transcript(job)
  uri = URI(job.transcript.transcript_file_uri)
  response = Net::HTTP.get(uri)
  data = JSON.parse(response)

  data['results']['transcripts'].first['transcript']
end

transcript = fetch_transcript(job)
puts transcript

Complete Example

Here is a reusable transcription class:

require 'aws-sdk-transcribeservice'
require 'net/http'
require 'json'
require 'securerandom'

class AudioTranscriber
  def initialize(region: 'us-east-1')
    @client = Aws::TranscribeService::Client.new(region: region)
  end

  def transcribe(s3_uri, language: 'en-US')
    job_name = "job-#{SecureRandom.hex(8)}"
    format = detect_format(s3_uri)

    @client.start_transcription_job(
      transcription_job_name: job_name,
      language_code: language,
      media_format: format,
      media: { media_file_uri: s3_uri }
    )

    job = wait_for_completion(job_name)
    fetch_transcript(job)
  end

  private

  def detect_format(uri)
    ext = File.extname(uri).delete('.').downcase
    %w[mp3 mp4 wav flac ogg amr webm].include?(ext) ? ext : 'mp3'
  end

  def wait_for_completion(job_name, timeout: 600)
    deadline = Time.now + timeout

    loop do
      resp = @client.get_transcription_job(transcription_job_name: job_name)
      job = resp.transcription_job

      return job if job.transcription_job_status == 'COMPLETED'
      raise "Failed: #{job.failure_reason}" if job.transcription_job_status == 'FAILED'
      raise "Timeout" if Time.now > deadline

      sleep 5
    end
  end

  def fetch_transcript(job)
    uri = URI(job.transcript.transcript_file_uri)
    data = JSON.parse(Net::HTTP.get(uri))
    data['results']['transcripts'].first['transcript']
  end
end

# Usage
transcriber = AudioTranscriber.new
text = transcriber.transcribe('s3://my-bucket/meeting.mp3')
puts text

Error Handling

Wrap your transcription calls to handle common failures:

def safe_transcribe(s3_uri)
  transcriber = AudioTranscriber.new
  transcriber.transcribe(s3_uri)
rescue Aws::TranscribeService::Errors::BadRequestException => e
  puts "Invalid request: #{e.message}"
  nil
rescue Aws::TranscribeService::Errors::LimitExceededException => e
  puts "Rate limited. Retry later."
  nil
rescue StandardError => e
  puts "Transcription error: #{e.message}"
  nil
end

Supported Languages

Amazon Transcribe supports many languages. Pass the correct code:

# Common language codes
'en-US'  # English (US)
'en-GB'  # English (UK)
'es-ES'  # Spanish
'fr-FR'  # French
'de-DE'  # German
'pt-BR'  # Portuguese (Brazil)
'ja-JP'  # Japanese
'zh-CN'  # Chinese (Mandarin)

Cost Optimization Tips

Transcription costs add up. Here are ways to reduce them:

  1. Batch jobs during off-peak hours. Pricing stays the same but you avoid hitting limits.

  2. Use the correct audio format. Mono audio at 16kHz is optimal for speech. Downsampling stereo saves storage and processing time.

  3. Set output bucket. Store results in your own S3 bucket to avoid repeated fetches:

client.start_transcription_job(
  transcription_job_name: job_name,
  language_code: 'en-US',
  media_format: 'mp3',
  media: { media_file_uri: audio_uri },
  output_bucket_name: 'my-transcripts-bucket'
)
  1. Delete old jobs. Clean up completed jobs to stay organized:
client.delete_transcription_job(transcription_job_name: job_name)

Next Steps

You now have working Ruby code for audio transcription. Consider adding:

  • Speaker diarization to identify who said what
  • Custom vocabularies for domain-specific terms
  • Redaction for sensitive content like PII
  • Real-time streaming for live audio

Check the AWS Transcribe documentation for advanced features.