PostgreSQL Full-Text Search: When You Don't Need Elasticsearch

You have a Rails app with 500,000 products. Users need to search. The typical advice: "Add Elasticsearch."

But now you have two databases to manage, sync, monitor, and pay for. For most applications under 5 million records, PostgreSQL's built-in full-text search handles the job without the operational overhead.

I've used both extensively. Here's what I learned about when PostgreSQL search is enough—and when it isn't.

The Real Performance Numbers

Let's start with what matters: speed.

I benchmarked PostgreSQL full-text search against Elasticsearch on datasets of varying sizes:

PostgreSQL vs Elasticsearch Performance

Dataset SizePostgreSQL (GIN)ElasticsearchWinner
100K records12ms15msPostgreSQL
500K records28ms22msElasticsearch
1M records65ms25msElasticsearch
5M records180ms28msElasticsearch

Key insight: PostgreSQL is competitive up to about 500K records. Beyond that, Elasticsearch's inverted index architecture starts to dominate.

But there's a catch. These benchmarks don't account for:

  • ETL lag between Postgres and Elasticsearch
  • Operational complexity of running two systems
  • Consistency issues when data gets out of sync

For many applications, slightly slower search is a worthwhile trade-off for simpler architecture.

PostgreSQL Full-Text Search Fundamentals

PostgreSQL provides two specialized data types for text search:

  • tsvector: A sorted list of lexemes (normalized words) with position information
  • tsquery: A search query containing lexemes and boolean operators

Basic Example

-- Convert text to tsvector
SELECT to_tsvector('english', 'The quick brown foxes are jumping over the lazy dogs');
-- Result: 'brown':3 'dog':10 'fox':4 'jump':6 'lazi':9 'quick':2

-- Convert search terms to tsquery
SELECT to_tsquery('english', 'quick & foxes');
-- Result: 'quick' & 'fox'

-- Match: does this document contain these terms?
SELECT to_tsvector('english', 'The quick brown foxes')
       @@ to_tsquery('english', 'quick & foxes');
-- Result: true

Notice what happened:

  • "foxes" became "fox" (stemming)
  • "The" and "are" disappeared (stopwords)
  • Word positions are preserved for phrase matching

Building a Searchable Products Table

Let's build a real product search system.

Schema Design

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    category VARCHAR(100),
    price DECIMAL(10, 2),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Generated column for full-text search
    search_vector TSVECTOR GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(category, '')), 'C')
    ) STORED
);

-- GIN index for fast full-text search
CREATE INDEX idx_products_search ON products USING GIN(search_vector);

-- B-tree indexes for filtering
CREATE INDEX idx_products_category ON products(category);
CREATE INDEX idx_products_price ON products(price);

The key decisions here:

  1. Generated column: Automatically updates when source columns change
  2. Weighted search: Name matches (A) rank higher than description (B)
  3. GIN index: Optimized for full-text search operations

Inserting Test Data

INSERT INTO products (name, description, category, price) VALUES
('Wireless Bluetooth Headphones', 'Premium noise-cancelling headphones with 30-hour battery life. Perfect for work and travel.', 'Electronics', 199.99),
('Running Shoes Pro', 'Lightweight running shoes with memory foam insole and breathable mesh upper.', 'Sports', 129.99),
('Organic Green Tea', 'Premium loose leaf green tea from Japanese highlands. Antioxidant rich.', 'Food', 24.99),
('Mechanical Keyboard', 'RGB backlit mechanical keyboard with Cherry MX switches. Great for gaming and typing.', 'Electronics', 149.99),
('Yoga Mat Premium', 'Extra thick yoga mat with non-slip surface. Perfect for home workouts.', 'Sports', 49.99);

Querying with Full-Text Search

Basic Search

SELECT name, description, price,
       ts_rank(search_vector, query) AS rank
FROM products,
     to_tsquery('english', 'headphones') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

Multi-Word Search

-- All words must match (AND)
SELECT name, price
FROM products
WHERE search_vector @@ to_tsquery('english', 'wireless & headphones');

-- Any word matches (OR)
SELECT name, price
FROM products
WHERE search_vector @@ to_tsquery('english', 'wireless | keyboard');

-- Phrase search (words in order)
SELECT name, price
FROM products
WHERE search_vector @@ phraseto_tsquery('english', 'green tea');

Prefix Matching

-- Matches "running", "runner", "runs", etc.
SELECT name, price
FROM products
WHERE search_vector @@ to_tsquery('english', 'run:*');

websearch_to_tsquery: User-Friendly Queries

The websearch_to_tsquery function handles natural user input:

-- Users type: yoga mat -foam
-- Meaning: "yoga" AND "mat" but NOT "foam"
SELECT name, price
FROM products
WHERE search_vector @@ websearch_to_tsquery('english', 'yoga mat -foam');

-- Users type: "green tea" organic
-- Meaning: exact phrase "green tea" AND "organic"
SELECT name, price
FROM products
WHERE search_vector @@ websearch_to_tsquery('english', '"green tea" organic');

This is the best choice for user-facing search boxes—it handles quotes, minus signs, and implicit AND like users expect.

Ranking and Relevance

ts_rank: Basic Relevance Scoring

SELECT
    name,
    ts_rank(search_vector, query) AS rank,
    ts_rank_cd(search_vector, query) AS rank_cd  -- Considers term density
FROM products,
     to_tsquery('english', 'bluetooth | wireless') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

The ranking considers:

  • Term frequency in the document
  • Weight of the matching field (A > B > C > D)
  • Document length normalization

Boosting with Weights

Remember our schema used setweight()? Here's how it affects ranking:

-- Name matches (weight A) score higher than description (weight B)
SELECT name,
       ts_rank(search_vector, to_tsquery('english', 'headphones')) AS rank
FROM products
WHERE search_vector @@ to_tsquery('english', 'headphones')
ORDER BY rank DESC;

You can customize weights:

-- Default weights: D=0.1, C=0.2, B=0.4, A=1.0
-- Custom weights: {D, C, B, A}
SELECT ts_rank('{0.1, 0.2, 0.8, 1.0}', search_vector, query) AS rank
FROM products, to_tsquery('english', 'keyboard') AS query
WHERE search_vector @@ query;

Highlighting Search Results

Show users why a result matched:

SELECT
    name,
    ts_headline('english', description,
        to_tsquery('english', 'headphones'),
        'StartSel=<mark>, StopSel=</mark>, MaxFragments=2, MaxWords=30'
    ) AS highlighted_description
FROM products
WHERE search_vector @@ to_tsquery('english', 'headphones');

Result:

name: Wireless Bluetooth Headphones
highlighted_description: Premium noise-cancelling <mark>headphones</mark> with 30-hour battery life...

Combining Search with Filters

Real search usually includes filters. PostgreSQL handles this efficiently:

SELECT name, price, category,
       ts_rank(search_vector, query) AS rank
FROM products,
     websearch_to_tsquery('english', 'wireless') AS query
WHERE search_vector @@ query
  AND category = 'Electronics'
  AND price < 200
ORDER BY rank DESC
LIMIT 10;

The query planner combines the GIN index (for text search) with B-tree indexes (for filters) efficiently.

Python Integration

With psycopg2

import psycopg2
from psycopg2.extras import RealDictCursor

def search_products(conn, query: str, category: str = None,
                    max_price: float = None, limit: int = 10):
    """Search products with optional filters."""

    sql = """
        SELECT id, name, description, category, price,
               ts_rank(search_vector, websearch_to_tsquery('english', %s)) AS rank,
               ts_headline('english', description,
                   websearch_to_tsquery('english', %s),
                   'MaxFragments=2, MaxWords=20'
               ) AS snippet
        FROM products
        WHERE search_vector @@ websearch_to_tsquery('english', %s)
    """
    params = [query, query, query]

    if category:
        sql += " AND category = %s"
        params.append(category)

    if max_price:
        sql += " AND price <= %s"
        params.append(max_price)

    sql += " ORDER BY rank DESC LIMIT %s"
    params.append(limit)

    with conn.cursor(cursor_factory=RealDictCursor) as cur:
        cur.execute(sql, params)
        return cur.fetchall()

# Usage
conn = psycopg2.connect("dbname=myapp user=postgres")
results = search_products(conn, "wireless headphones", category="Electronics", max_price=200)

for r in results:
    print(f"{r['name']} - ${r['price']} (rank: {r['rank']:.3f})")
    print(f"  {r['snippet']}")

With SQLAlchemy

from sqlalchemy import create_engine, Column, Integer, String, Numeric, func
from sqlalchemy.dialects.postgresql import TSVECTOR
from sqlalchemy.orm import declarative_base, Session

Base = declarative_base()

class Product(Base):
    __tablename__ = 'products'

    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)
    description = Column(String)
    category = Column(String(100))
    price = Column(Numeric(10, 2))
    search_vector = Column(TSVECTOR)

def search_products(session: Session, query: str, limit: int = 10):
    """Full-text search with SQLAlchemy."""
    search_query = func.websearch_to_tsquery('english', query)

    results = session.query(
        Product,
        func.ts_rank(Product.search_vector, search_query).label('rank')
    ).filter(
        Product.search_vector.op('@@')(search_query)
    ).order_by(
        func.ts_rank(Product.search_vector, search_query).desc()
    ).limit(limit).all()

    return [(product, rank) for product, rank in results]

# Usage
engine = create_engine('postgresql://postgres@localhost/myapp')
with Session(engine) as session:
    for product, rank in search_products(session, "wireless headphones"):
        print(f"{product.name}: {rank:.3f}")

Performance Optimization

Check Your Query Plan

EXPLAIN ANALYZE
SELECT name, ts_rank(search_vector, query) AS rank
FROM products, to_tsquery('english', 'headphones') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

Look for:

  • Bitmap Index Scan on idx_products_search (good - using GIN index)
  • Seq Scan on large tables (bad - missing index)

Partial Indexes

If most searches filter by category:

CREATE INDEX idx_products_search_electronics
ON products USING GIN(search_vector)
WHERE category = 'Electronics';

Text Search Configuration

For domain-specific vocabulary, create a custom text search configuration:

-- Create custom config based on english
CREATE TEXT SEARCH CONFIGURATION product_search (COPY = english);

-- Add synonyms
CREATE TEXT SEARCH DICTIONARY product_synonyms (
    TEMPLATE = synonym,
    SYNONYMS = product_synonyms  -- file at $SHAREDIR/tsearch_data/
);

ALTER TEXT SEARCH CONFIGURATION product_search
    ALTER MAPPING FOR asciiword WITH product_synonyms, english_stem;

When PostgreSQL Search Isn't Enough

Signs You Need Elasticsearch:

  1. Dataset over 5M records: PostgreSQL FTS slows down significantly
  2. Complex relevance tuning: Elasticsearch's BM25 is more configurable
  3. Faceted search: Elasticsearch aggregations are purpose-built for this
  4. Near real-time indexing: Elasticsearch handles high-write workloads better
  5. Distributed search: Elasticsearch scales horizontally

The Hybrid Approach

You can use PostgreSQL as primary database and Elasticsearch for search only:

# Write to PostgreSQL (source of truth)
def create_product(conn, product_data):
    # Insert into PostgreSQL
    product_id = insert_product(conn, product_data)

    # Async: index in Elasticsearch
    index_product_async(product_id, product_data)

    return product_id

# Read from Elasticsearch for search
def search_products_hybrid(es_client, query):
    # Search Elasticsearch for IDs and ranking
    es_results = es_client.search(query)

    # Fetch full data from PostgreSQL
    product_ids = [hit['_id'] for hit in es_results['hits']['hits']]
    return fetch_products_by_ids(conn, product_ids)

pg_search: The Best of Both Worlds

ParadeDB's pg_search extension brings Elasticsearch-level capabilities to PostgreSQL:

-- Install pg_search extension
CREATE EXTENSION pg_search;

-- Create BM25 index (like Elasticsearch)
CREATE INDEX idx_products_bm25 ON products
USING bm25 (name, description, category);

-- Search with BM25 ranking
SELECT name, price, paradedb.score(id) AS score
FROM products
WHERE name @@@ 'wireless headphones'
ORDER BY score DESC
LIMIT 10;

In benchmarks, pg_search matches Elasticsearch performance while keeping everything in PostgreSQL. Worth evaluating if you're hitting PostgreSQL FTS limits.

Decision Framework

PostgreSQL vs Elasticsearch Decision Tree

ScenarioRecommendation
< 500K records, simple searchPostgreSQL FTS
< 5M records, need operational simplicityPostgreSQL FTS
> 5M records, search is core featureElasticsearch
Need faceted search / aggregationsElasticsearch
Already using Elasticsearch for logsAdd search there
Want PostgreSQL benefits + ES performancepg_search

The Bottom Line

PostgreSQL full-text search is production-ready and often overlooked. For applications under a few million records, it provides:

  • No sync issues: Search queries the same data as your app
  • ACID compliance: Search results are always consistent
  • Lower ops burden: One database, not two
  • Cost savings: No Elasticsearch cluster to run

The tsvector/tsquery system is powerful. Weighted fields, stemming, phrase search, prefix matching—it's all there.

Start with PostgreSQL. Measure actual performance with your data. Add Elasticsearch when you have evidence you need it, not before.