Mock SQL Data Table Generator
Generates a CREATE TABLE + 5 mock INSERTs based on the table name you provide.
โ
Mock database tables: synthetic data, distributions, and seeders
A mock database table is a fictional but well-formed dataset that mimics the shape, types, and relationships of a real production table. Modern teams treat synthetic data as a first-class engineering artifact: it powers local development, integration tests, load tests, demos, training environments, and tutorials. Done well, a mock dataset captures not just plausible values but the statistical shape of the real data โ without ever touching personally identifiable information protected by GDPR, LGPD, or HIPAA.
This generator produces a minimal mock table for a single resource; the reference below covers generation libraries, realistic distributions, foreign-key consistency, deterministic seeding, and the synthetic-data toolchain used in real engineering organisations.
Generation libraries and use cases
Every major language has a mature faker library. Faker.js (originally by Marak, now maintained by Sebastiano Pace) is the de-facto standard for Node and the browser. Mimesis dominates Python with locale-aware generators for 30+ regions. Bogus serves .NET, MockNeat covers Java, and web tools like generatedata.com, Mockaroo, and JSON Generator let designers and analysts build datasets without code. Typical use cases include:
- Load tests: populate 1M+ rows to verify index strategy and query plans under realistic volume.
- Component dev: 10k rows for paginated tables, virtualised lists, and infinite scroll.
- UI prototyping: a few hundred rows to flesh out empty states and edge cases.
- Demos, training, and tutorials: data that looks real, ships freely, and never leaks customer information.
Realistic distributions
Random uniform values look fake the moment they hit a chart. Real-world tables follow recognisable distributions: Zipf's law dictates that a tiny fraction of products account for most sales, seasonality spikes traffic on Black Friday and dips in January, and churn means user activity decays over time. Good mocks reproduce these shapes:
// Zipf-weighted product picker
const weights = products.map((_, i) => 1 / (i + 1));
const pick = weighted(products, weights);
// Seasonal multiplier
const month = order.created_at.getMonth();
const factor = [0.7, 0.8, 0.9, 1, 1, 1, 1.1, 1, 1, 1.2, 1.5, 2][month];
order.total *= factor;
The Synthetic Data Vault (SDV), an open-source MIT project, fits statistical models to real data and emits synthetic rows that preserve column-level distributions and inter-column correlations โ useful when feeding ML pipelines that would otherwise need real PII.
Relations and foreign-key consistency
Multi-table datasets must respect referential integrity. Generate parents first, then children that point at existing parent IDs:
- 1:N: every
Postpicks a random existingUser.id. - N:N: a join table (
Enrollment) picks pairs of existingStudent.idandCourse.id, with a uniqueness check. - Self-referential: hierarchies (employee โ manager) require generating roots first, then descending.
PRNG seeding turns a random generator into a deterministic function โ the same seed always produces the same dataset, making bugs reproducible: faker.seed(42) in Faker.js, random.seed(42) in Python.
Export formats and seeders
Mock data ships in many shapes: CSV for spreadsheets and BigQuery loads, JSON for API stubs and front-end fixtures, SQL INSERT for traditional database seeders, XLSX for non-technical stakeholders, and Parquet for analytics warehouses. Framework-level seeders run the generator on demand: Knex.js and Sequelize for Node, Prisma db seed, Rails db:seed, Laravel factories, and Django fixtures. For B2B contexts where the source data exists but cannot leave the building, commercial products like Tonic.ai, Gretel, and Mostly AI generate synthetic copies that pass privacy audits.
FAQ
How realistic does the mock need to be? It depends on the goal. UI dev tolerates uniform random; performance testing needs realistic distributions; ML training demands the SDV / Gretel level. Pick the cheapest tier that answers your question.
Is mock data reproducible? Yes โ set the seed before generation. With the same seed, library version, and parameters, you get bit-identical output across machines and CI runs.
Is it safe to commit to git? Yes, as long as no real PII slipped in. Pure synthetic rows have no privacy concerns; check Faker locale settings if you ever derived them from a real dataset.
What about LGPD / GDPR? Synthetic data is not personal data because it does not refer to identifiable individuals. Be careful, though, with quasi-identifiers โ combinations of (zip, age, gender) can still re-identify someone, so SDV and Mostly AI apply differential-privacy noise on top.
Related Tools
Handwriting Generator
Convert typed text into an image with handwriting appearance. Useful for adding a personal touch to digital work.
Resume Generator
Fill a simple printable A4 CV from a form with personal data, education and experience.
Favicon Generator
Generate a favicon from text/emoji in all common sizes (16, 32, 48, 64, 192, 512). PNG download.