Skip to main content

Static Data Masking Tools (and How to Roll Your Own)

Tianzhou · May 25, 2026

Update HistoryComment
2026/05/25Initial version.

Every team eventually wants a copy of production in dev, test, and staging. And every team eventually realizes that copy is a liability the moment it carries real PII. Static data masking (SDM) is the fix: it rewrites the sensitive data into a sanitized copy, so the lower environments never hold the real thing. The copy is permanent and irreversible. There is no path back to the original values, which is exactly the point. This post walks the tools that do it, and shows how to build your own when none of them fit.

Perforce Delphix

Engines: Oracle, SQL Server, PostgreSQL, MySQL, and Db2 (z/OS and iSeries), plus mainframe, PaaS, and file sources. The common ones ship as bundled "Standard" connectors, with add-on "Select" connectors for the long tail.

Delphix is the enterprise option, and it makes no secret of it. It runs automated sensitive-data discovery, then applies deterministic masking algorithms that preserve referential integrity within and across sources. The masking is irreversible: production values become realistic but fictitious. The clever bit is that it pairs masking with data virtualization, so masked copies ship to downstream environments without provisioning full physical storage. On-prem and cloud.

Verdict: if you carry a heterogeneous estate, a formal refresh pipeline, and an auditor asking for compliance evidence, this is the tool. Priced and operated as an enterprise platform, with everything that implies.

Tonic.ai

Engines: PostgreSQL, MySQL, SQL Server, Oracle, and MongoDB, plus Snowflake, BigQuery, Spark, Salesforce, and flat files.

Tonic Structural comes at the problem from the developer's side. It de-identifies, subsets, and synthesizes structured and semi-structured data, turning a production database into a referentially intact test set. It runs as SaaS or self-hosted. Synthetic generation is first-class here, which matters when production data cannot leave its boundary even masked, so you regenerate a plausible stand-in instead of mirroring the original.

Verdict: the pick for teams that want realistic test data wired into development, and for the cases where synthesis matters more than mirroring production.

greenmask

Engines: PostgreSQL (production-ready); MySQL in beta.

greenmask is the open-source option (Apache-2.0), and it earns its keep by being unfussy. It works as a logical-dump proxy: it produces backups compatible with pg_restore, masking columns on the way through. Deterministic transformers use hash functions for consistent output, so referential integrity holds, and it supports database subsetting (including cyclic and polymorphic references) and synthetic generation. It is a single stateless binary, storage-agnostic across local and S3-compatible targets. It ships no scheduler of its own. You invoke the commands from cron, CI, or an orchestrator like Airflow.

Verdict: if you are a Postgres shop and you want masking folded into a dump/restore workflow without buying a platform, greenmask is the one to reach for.

Roll your own

To be honest, no open-source tool covers static masking end to end across engines. greenmask handles the Postgres dump path; beyond that, teams build their own. "Build your own" means reproducing the five components a full SDM solution bundles:

  1. Discovery. Find and classify every sensitive column, and re-scan as the schema changes. Miss one and real PII ships downstream. This is usually the first piece to outgrow a script.
  2. Policy. Decide how each column is masked, an algorithm per type, deterministic where joins and foreign keys must survive, and keep the rules consistent as tables are added.
  3. Transformation engine. The component that reads production, applies the masks, and writes the target. This is where performance and reliability live: full-table rewrites at volume, runs that are idempotent and restartable after a failure, and referential integrity held across the whole dataset.
  4. Scheduling. Run the refresh on a cadence, trigger it from upstream events, manage dependencies. A homegrown job leans on cron, CI, or an orchestrator.
  5. Audit logging. Record what ran, which columns were masked, and prove to an auditor that no real PII reached non-production.
Loading diagram…

A script can cover one or two of these for a small, stable, single-engine schema. The build-vs-buy line sits at the other three. Discovery, scheduling, and audit are what you end up reinventing as the estate grows.

Comparison

Measured against the five components:

DelphixTonic.aigreenmaskRoll your own
EnginesOracle, SQL Server, Postgres, MySQL, Db2 (+ more)Postgres, MySQL, SQL Server, Oracle, Mongo, Snowflake, BigQueryPostgres (MySQL beta)Anything you script
DiscoveryAutomatedAutomatedDefine rules yourselfManual
PolicyCentralized UICentralized UIYAML configHand-coded
Referential integrityAcross sourcesAcross tablesDeterministicDIY
Subsetting / syntheticSubsettingBothBothOnly if you build it
SchedulingBuilt-inBuilt-in (cron)External (cron / CI)External
Audit loggingBuilt-in reportsAudit trailNoneOnly if you build it
LicenseCommercialCommercial (SaaS / self-hosted)Open source (Apache-2.0)Free (your time)

The masking itself is not where these tools split. All four transform data. The surrounding components are the real story. Delphix and Tonic bundle discovery, scheduling, and audit; greenmask gives you the engine but leaves those to your pipeline; a homegrown script leaves you all five. Put simply, the more engines, tables, and compliance scrutiny you carry, the more those bundled components justify paying for a commercial tool.


One last thing worth keeping straight. Static masking protects the copies that leave production. It does nothing for the live database, where the requirement flips: you mask at read time, by role, without ever altering the stored data. That is dynamic data masking, a separate control entirely. Bytebase handles that side: queries route through its SQL Editor and results are masked before they leave it, one policy across every engine. It is not a static masking tool, so pair it with one of the above for non-production.

Back to blog

Explore the standard for database development