# Database Design Patterns Every Developer Should Know

> A practical walk through six database design patterns you will actually meet in production: normalization, star schema, EAV, polymorphic associations, multi-tenancy, and the anti-patterns worth avoiding.

Adela | 2026-03-11 | Source: https://www.bytebase.com/blog/database-design-patterns/

---

A database design pattern is a reusable way to organize tables, relationships, and indexes in a relational database. Getting it wrong early is expensive. The pattern you pick sets your query performance ceiling, your data integrity guarantees, and how much pain your next schema migration will cause.

This guide walks through six patterns you'll run into in most production systems, the trade-offs behind each, and the anti-patterns that cause real outages.

## What are database design patterns?

A database design pattern is a repeatable solution to a common data modeling problem. It's a structural decision: how tables relate to each other, how data gets split or combined, and where you draw the boundaries between entities. Some patterns optimize for write safety (normalization), some for read speed (star schema), some for flexibility (EAV, JSONB columns), and some for keeping one customer's data away from another's (multi-tenancy). Most real databases mix several of these together.

## Normalization (1NF through 3NF)

Normalization removes redundant data by splitting it across related tables. The goal is to store each fact exactly once, so an update only ever happens in one place.

**First Normal Form (1NF):** Every column holds a single value. No comma-separated lists, no arrays stuffed into a VARCHAR. Each row is uniquely identifiable by a [primary key](/blog/primary-key-vs-foreign-key/).

**Second Normal Form (2NF):** Every non-key column depends on the entire primary key, not just part of it. This matters when you have composite keys. If an `order_items` table is keyed on `(order_id, product_id)`, then `product_name` depends only on `product_id` and belongs in a separate `products` table.

**Third Normal Form (3NF):** Non-key columns depend only on the primary key, not on other non-key columns. If a `customers` table has `city` and `state`, and `city` determines `state`, then `state` should move to a `cities` table. That removes transitive dependencies.

### When to denormalize

3NF is the default target for transactional systems. But read-heavy workloads (dashboards, reports, search) often need denormalized data to dodge expensive multi-table joins. Denormalization means deliberately adding redundancy: storing `total_orders_count` directly on a customer row instead of running `COUNT(*)` against the orders table on every request.

The rule of thumb: start normalized, and denormalize only when you have measured evidence that a specific query is too slow.

## Star schema and snowflake schema

These two are for analytical workloads, not transactional ones.

**Star schema** puts a central fact table (holding measures like revenue, quantity, or duration) in the middle, surrounded by denormalized dimension tables (date, product, customer, region). Every query is one join from fact to dimension. Simple, and fast.

**Snowflake schema** normalizes those dimension tables into sub-dimensions. A `location` dimension splits into `city → state → country` tables. You save storage and improve data integrity, at the cost of more joins.

| | Star schema | Snowflake schema |
|---|:-:|:-:|
| **Dimension tables** | Denormalized | Normalized into sub-tables |
| **Query joins** | Fewer | More |
| **Storage** | Higher (redundant data) | Lower |
| **Query speed** | Faster | Slower |
| **Best for** | Dashboards, ad-hoc BI | Complex hierarchies, storage-sensitive environments |

Modern columnar warehouses (Snowflake, BigQuery, Redshift) have optimized away much of the old performance penalty for snowflake schemas, so the choice these days is less about speed and more about how your team thinks about the data. Star schema is still the default for most [analytics use cases](/blog/top-database-schema-diagram-tools/), mostly because it's easier to understand and query.

## Entity-Attribute-Value (EAV)

EAV stores data in three columns: `entity_id`, `attribute_name`, and `value`. Instead of a wide table with one column per attribute, each attribute becomes its own row.

```sql
-- EAV table
CREATE TABLE product_attributes (
    entity_id   INT REFERENCES products(id),
    attribute   VARCHAR(100),
    value       VARCHAR(255)
);

-- A T-shirt has size and color; a laptop has cpu and ram_gb.
-- Both go in the same table.
INSERT INTO product_attributes VALUES (1, 'size', 'XL');
INSERT INTO product_attributes VALUES (1, 'color', 'blue');
INSERT INTO product_attributes VALUES (2, 'cpu', 'M3 Pro');
INSERT INTO product_attributes VALUES (2, 'ram_gb', '36');
```

**When it looks attractive:** Product catalogs with hundreds of optional attributes. Clinical records with thousands of possible fields per patient.

**Why it usually disappoints:** Any query that retrieves more than one attribute needs self-joins or pivots. Filtering is slow because `value` is always VARCHAR, so there's no type checking and no numeric indexing. Magento 1.x used EAV for product data and was notorious for poor query performance at scale. PostgreSQL benchmarks show JSONB running over 50,000x faster than EAV for unindexed queries, in a database 3x smaller.

**The modern alternative:** typed columns for the core fields, plus a JSONB column for the variable remainder.

```sql
CREATE TABLE products (
    id          SERIAL PRIMARY KEY,
    name        VARCHAR(200) NOT NULL,
    price       DECIMAL(10,2) NOT NULL,
    category_id INT REFERENCES categories(id),
    attributes  JSONB DEFAULT '{}'
);
```

A good test: if you find yourself writing `WHERE attributes->>'status' = 'active'` in most of your queries, then `status` wants to be a proper column, not something buried in JSON.

## Polymorphic associations

A polymorphic association uses a single [foreign key](/blog/primary-key-vs-foreign-key/) column plus a "type" column to point at rows in several different tables.

```sql
CREATE TABLE comments (
    id              SERIAL PRIMARY KEY,
    body            TEXT,
    commentable_id  INT,          -- could point to posts, photos, or videos
    commentable_type VARCHAR(50)  -- 'Post', 'Photo', or 'Video'
);
```

You see this a lot in Rails and Django ORMs. The catch: the database can't enforce a foreign key constraint, because `commentable_id` points to different tables depending on `commentable_type`. So no CASCADE deletes, no referential integrity checks, and orphaned records that pile up quietly.

GitLab's engineering documentation explicitly recommends against polymorphic associations. The alternative is a separate association table per type.

```sql
CREATE TABLE post_comments (
    id      SERIAL PRIMARY KEY,
    body    TEXT,
    post_id INT REFERENCES posts(id) ON DELETE CASCADE
);

CREATE TABLE photo_comments (
    id       SERIAL PRIMARY KEY,
    body     TEXT,
    photo_id INT REFERENCES photos(id) ON DELETE CASCADE
);
```

More tables, but the database enforces every relationship for you. For systems that need to hold their data integrity over the long run, that's almost always the better trade.

## Multi-tenant patterns

Multi-tenancy is about how you isolate one customer's data from another's when several organizations share the same application. There are three standard approaches, each with a different cost-versus-isolation trade-off.

**Shared schema (pool model):** All tenants share the same tables, and a `tenant_id` column on every table keeps the data apart. Cheapest and simplest, but one missing `WHERE tenant_id = ?` filter can leak data across tenants.

**Schema-per-tenant:** Each tenant gets its own schema inside the same database instance. PostgreSQL and SQL Server support this; MySQL doesn't. Bytebase's analysis [found that it introduces complexity comparable to database-per-tenant](/blog/multi-tenant-database-architecture-patterns-explained/) without the matching isolation benefit.

**Database-per-tenant (silo model):** Each tenant gets a fully separate database. Maximum isolation, highest operational cost. Running schema migrations across hundreds of databases needs [dedicated tooling](/blog/how-to-handle-database-schema-change/).

| | Shared schema | Schema-per-tenant | Database-per-tenant |
|---|:-:|:-:|:-:|
| **Cost** | Lowest | Medium | Highest |
| **Data isolation** | Weakest | Medium | Strongest |
| **Per-tenant customization** | None | Some | Full |
| **Migration complexity** | One migration | Per-schema | Per-database |
| **Best for** | Startups, MVPs | Mid-tier SaaS | Enterprise, regulated industries |

Start with shared schema unless you have a compliance requirement (HIPAA, GDPR data residency) that demands physical isolation. PostgreSQL's [row-level security](/blog/postgres-row-level-security-limitations-and-alternatives/) can add a database-enforced isolation layer on top of your application-level `tenant_id` filtering.

## Anti-patterns to avoid

These are the design decisions that turn into production incidents.

**The God Table.** One table stores everything: users, orders, products, logs. It starts at 50 columns and grows to 200. Splitting it into focused tables typically cuts query times by 80% or more.

**Missing primary keys.** Always use a [surrogate primary key](/blog/choose-primary-key-uuid-or-auto-increment/), either an auto-increment integer or a UUID. Natural keys like name plus date of birth lead to collisions and mixed-up records.

**Over-normalization.** You can overshoot in the other direction too: splitting data across so many tables that a simple query needs 10+ joins. One fintech startup decomposed currency codes, transaction types, and timestamp components into separate lookup tables, and the system buckled under peak load.

**Storing money as FLOAT.** Floating-point arithmetic introduces rounding errors. `0.1 + 0.2 = 0.30000000000000004` in most languages. Use DECIMAL(10,2) for monetary values.

**No indexes on filtered columns.** Index the columns that show up in your WHERE clauses and JOIN conditions, and use EXPLAIN to confirm. Too many indexes slow down writes; too few mean full table scans.

**Wrong data types.** Dates stored as VARCHAR block range queries. Enums stored as freeform strings invite typos. Use DATE/TIMESTAMP for times, DECIMAL for money, and ENUM or lookup tables for fixed value sets.

Getting the pattern right at the [schema design](/blog/top-database-schema-design-best-practices/) stage is a lot cheaper than fixing it in production.

## How Bytebase helps enforce design patterns

[Bytebase](/) is a database DevSecOps platform that catches pattern violations before they reach production. Its [SQL review](/blog/is-sql-review-necessary) policy can enforce rules like:

- [Every table must have a primary key](/blog/sql-review-rule-explained-require-primary-key/)
- Column naming must follow conventions (e.g., `_id` suffix for foreign keys, `_at` suffix for timestamps)
- [CASCADE operations must be explicitly approved](/blog/sql-review-rule-explained-prohibit-cascade/)
- Specific column types are required for specific data (TIMESTAMP for dates, DECIMAL for currency)

These rules run automatically on every [schema change](/blog/what-is-database-schema/) before it's applied, so the violations get caught in review rather than during a 3 AM incident.

## Further readings

- [Top 10 Database Schema Design Best Practices](/blog/top-database-schema-design-best-practices/)
- [What is a Database Schema?](/blog/what-is-database-schema/)
- [Multi-Tenant Database Architecture Patterns Explained](/blog/multi-tenant-database-architecture-patterns-explained/)
- [Primary Key vs Foreign Key](/blog/primary-key-vs-foreign-key/)
- [DDL vs DML in SQL](/blog/what-is-ddl-and-dml-in-sql-database/)
- [How to Handle Database Migration / Schema Change](/blog/how-to-handle-database-schema-change/)