Proving that PostgreSQL can handle millions of records with proper optimization - achieving up to 108.22× faster query performance.
I've always been curious about the PostgreSQL vs NoSQL performance debate. I also had a frustrating experience working with a colleague whose PostgreSQL queries took up to 30 seconds per request. This made me question: Is PostgreSQL really slow, or was it just poorly optimized?
This project aims to prove that PostgreSQL is fast enough to handle millions of records efficiently when properly optimized.
I chose the IMDB dataset for this experiment, using these files:
- name.basics
- title.basics
- title.ratings
- title.principals
| Table Name | Number of Rows |
|---|---|
| title_type | 10 |
| principal_categories | 13 |
| genres | 28 |
| professions | 46 |
| title_ratings | 470,341 |
| name_basics | 1,000,000 |
| title_principals | 956,348 |
| title_basics | 1,000,000 |
| name_known_for_titles | 2,108,364 |
| title_genres | 1,688,104 |
| name_primary_professions | 1,637,940 |
| TOTAL | 8,861,194 |
The full IMDB dataset is much larger, but I limited it to ~8.8 million records for this project.
Full database schema available here.
To demonstrate the impact of optimization, I created two separate database connections:
- Optimized Connection - Uses indexes and all PostgreSQL optimization features
- Unoptimized Connection - Simulates a poorly configured database
The unoptimized connection disables key performance features:
unoptimizedConfig.AfterConnect = func(ctx context.Context, conn *pgx.Conn) error {
_, err := conn.Exec(ctx, `
SET enable_indexscan = OFF;
SET enable_bitmapscan = OFF;
SET enable_indexonlyscan = OFF;
`)
return err
}Each API endpoint accepts an optimize parameter (default: true):
optimize=true→ Uses the optimized connectionoptimize=false→ Uses the unoptimized connection/explainendpoint → Returns the query execution plan
This endpoint retrieves a list of movies with optional title filtering.
Query Structure:
var GET_ALL_QUERY_BASE = `
SELECT tb.tconst, tt.name, tb.primary_title, tb.original_title,
tb.is_adult, tb.start_year, tb.end_year, tb.runtime_minutes,
ARRAY(
SELECT g.name FROM title_genres tg
JOIN genres g ON g.id = tg.genre_id
WHERE tg.tconst = tb.tconst
) AS genres,
tr.average_rating rating,
tr.num_votes votes
FROM title_basics tb
JOIN title_type tt ON tt.id = tb.title_type_id
LEFT JOIN title_ratings tr ON tr.tconst = tb.tconst`
var GET_ALL_QUERY_FILTER = `
WHERE tb.primary_title ILIKE '%' || $1 || '%'
OR tb.original_title ILIKE '%' || $1 || '%'`
var GET_ALL_QUERY_LIMIT = `
LIMIT 20`I intentionally used a nested subquery to demonstrate that even "complex" queries can perform well with proper optimization.
This endpoint retrieves detailed information about a specific movie, including cast and crew.
Query Approach:
pool.QueryRow(ctx, GET_BY_ID_QUERY, tconst)
// ...
castRows, err := pool.Query(ctx, GET_CAST_QUERY, tconst)
// ...
crewRows, err := pool.Query(ctx, GET_CREW_QUERY, tconst)Instead of using nested subqueries, I split the logic into three separate queries within the same connection. This is more efficient because:
- Less work for the database
- Data processing happens in the application layer
- Better control over what data to fetch
| Machine | CPU | RAM | Disk |
|---|---|---|---|
| Oracle Cloud VM (Free Tier, ARM) | 2 Core | 12 GB | 48 GB |
Simple explanation:
- Unoptimized = Reading a book page by page to find one word
- Optimized = Using the book's index to jump directly to the right page
| Endpoint | Unoptimized | Optimized | Latency Reduction | Speedup |
|---|---|---|---|---|
/movies |
2876.061 ms | 26.576 ms | 99.08% | 108.22× faster |
/movies?title=batman |
3710.734 ms | 409.300 ms | 88.97% | 9.07× faster |
/movies/:id |
421.742 ms | 0.633 ms | 99.85% | 666.28× faster |
PostgreSQL is extremely fast when properly optimized - even with millions of records
Optimization matters more than the database choice - a well-optimized PostgreSQL query can outperform poorly designed NoSQL queries
Indexes are a double-edged sword:
- Makes reads significantly faster
- Makes writes slightly slower
- Use them wisely based on your read/write ratio
Query structure impacts performance - splitting complex queries can sometimes be better than nested subqueries
- Backend: Go (Golang)
- Database: PostgreSQL
- Dataset: IMDB (8M+ records)
- Libraries: pgx (PostgreSQL driver)





