Reading Guide
This primer provides a comprehensive introduction to synthetic data, covering everything from basic concepts to advanced applications. Each chapter is designed to be as standalone as possible, so you can jump to topics that interest you most. This guide helps you navigate the content based on your background and interests.
Quick Start Paths
Section titled “Quick Start Paths”| Reader Type | Primary Focus | Recommended Reading Path |
|---|---|---|
| Beginners | Understanding fundamentals | What is Synthetic Data → Why Use Synthetic Data → Generation Methods → Quality Evaluation → Real World Deployments |
| Practitioners | Implementation & development | Why Use Synthetic Data → Generation Methods → Quality Evaluation → Privacy-preserving Synthesis → Practical Data Synthesis → LLM-driven Data Synthesis → Real World Deployments → Challenges and Risks |
| Decision Makers | Business value & strategy | Why Use Synthetic Data → Real World Deployments → Applications in Public Sector → Challenges and Risks → Outlook & Trends |
| Privacy Officers | Compliance & risk management | Privacy-preserving Synthesis → Real World Deployments → Applications in Public Sector → Challenges and Risks |
| Researchers | Latest developments & methods | LLM-driven Data Synthesis → Challenges and Risks → Outlook & Trends |
Key Concepts & Navigation Elements
Section titled “Key Concepts & Navigation Elements”Essential Terms
Section titled “Essential Terms”These fundamental concepts appear throughout the primer. For detailed definitions, see our Glossary of Terms.
- Synthetic Data - Artificial data generated by algorithms that imitates the patterns of real data, such as survey responses, financial transactions, or sensor readings collected from actual people, events, or systems
- Synthesizer - The AI system that creates synthetic data by learning from real examples
- Generative Models - AI that can create new content (like text, images, or tabular data) after learning from examples
- Training Dataset - The real data used to teach the AI what patterns to copy
- Held-out Dataset - Real data kept separate to test how well the synthetic data works
- Fine-tuning - Customizing an AI model for a specific task or industry
- Prompt Engineering - Writing instructions to get AI to produce what you want
- Data Modality - The format of data (spreadsheets, photos, text, audio recordings, etc.)
- Privacy Preservation - Managing the risk of exposing people’s identifiable or sensitive information
Callout Boxes Guide
Section titled “Callout Boxes Guide”Throughout the primer, you’ll encounter four types of highlighted information:
Getting Help
Section titled “Getting Help”- Glossary of Terms - Definitions of technical concepts and terminology
- References - Cited research papers and additional resources
- Credits - Contributors, acknowledgments, and version history