My Journey Building Datter
Datter started from a simple frustration: data exploration is too often treated like prep work, when it’s actually where the most important questions (and the biggest mistakes) are discovered. I wanted a tool that could take a raw dataset, whether it’s a quick CSV export or a multi-table warehouse, and immediately surface what matters: quality issues, distributions, outliers, correlations, and “what should I look at next?”
The goal was never “another charting tool.” The goal was to build a product that compresses the first few hours of analysis into minutes, without requiring notebooks, boilerplate Pandas scripts, or context switching across five different tools.
The Brief
The brief I set for myself was to build an EDA platform that could:
- Run fast enough to feel instant, even on large datasets
- Surface statistical insights automatically (not just visuals)
- Offer a clean workflow from “import” → “understand” → “share” → “act”
- Take security seriously, especially for sensitive datasets
- Scale from solo exploration to teams and enterprise governance
In short: make deep exploratory analysis accessible, repeatable, and safe.
First Steps
Before committing to implementation, I spent time defining what “good” looks like for EDA:
- The product should guide the user without getting in their way.
- Insights should be explainable (why is this flagged? what does it mean?).
- Performance should be a feature, not an afterthought.
- Defaults must be strong, but customisation should always be possible.
That early focus shaped the architecture: the UI needed to feel lightweight and responsive, while the analysis engine needed to be rigorous and reliable.
Technology Stack
- WebAssembly - Powers in-browser data processing and analysis without server round-trips
- DuckDB-Wasm - Handles SQL queries and data transformations directly in the browser
- Apache Arrow - Enables efficient columnar data transfer and processing
- TypeScript - Provides type safety across the entire codebase
- React - Builds the fast, composable UI components and interactive visualizations
- Tailwind CSS - Enables rapid, responsive design implementation
- Brevo SMTP - Handles email delivery
- Docker - Enables easy containerisation and deployment
- Gemini API - Enables AI-powered insights and visualisations
- MinIO - Handles object storage for file uploads and downloads
- Caddy - Handles HTTPS and domain routing
- Hetzner Cloud - Handles the hosting of Datter backend services on a dedicated server
- PostgreSQL - Handles the database for the Datter platform
- Vercel - Handles the hosting of the Datter frontend
- BetterAuth - Handles the authentication for the Datter platform
- TensorJS - Enables the use of TensorFlow.js for machine learning in browser - user data is not stored by, or shared with, Datter or any other third parties
- Stripe - Handles subscriptions
Real-World Challenges: Performance, Correctness, and UX
As Datter evolved, a few recurring challenges shaped the final design.
Making “Fast” Feel Instant
EDA can involve heavy computation: profiling columns, running statistical tests, building correlation matrices, and generating interactive charts. The difficulty isn’t just raw speed: it’s keeping the UI responsive, streaming progress, and making the experience feel calm even when something complex is happening.
Turning Raw Stats Into Useful Guidance
Numbers aren’t insights by themselves. Datter needed to translate outputs into actions:
- Flag quality issues (mixed types, constant columns, high cardinality)
- Explain why something matters (e.g., skewness, missingness patterns)
- Suggest next steps (imputation options, transformations, encodings)
The challenge was staying helpful without becoming noisy.
Designing for “Exploration” (Not “Configuration”)
Many data tools start with settings. Datter starts with outcomes. The UI needed to be opinionated enough to guide users, while still letting advanced users drill down, validate, and export what they find.
Features
Below are the main capabilities Datter is built around, starting from the local-first core, then expanding into Pro and Enterprise workflows.
Core Capabilities (Client-Side)
Powered by WebAssembly (DuckDB-Wasm & Apache Arrow) for secure, in-browser processing.
- Multi-format ingestion:
- CSV (delimiter + encoding detection, header detection, preview before import)
- Excel (multi-sheet selection, type preservation for dates/currencies/percentages)
- JSON/JSONL (flatten nested objects)
- Parquet (native DuckDB support)
- Automatic type detection: numeric, categorical, datetime, boolean inference
- Dataset overview: row counts, memory usage, duplicate detection
- Column profiling:
- Distributions (histograms, box plots)
- Summary stats (min/max/mean/median/mode)
- Uniques + null percentage
Automated Exploratory Data Analysis (EDA)
Automated statistical analysis that replaces repetitive “first notebook” scripts.
- Data quality checks: mixed types, constant columns, high cardinality issues
- Missing value analysis: patterns + imputation suggestions (mean/median/mode/drop)
- Outlier detection: IQR + Z-score methods with visual highlighting
- Correlation analysis:
- Pearson & Spearman matrices
- Categorical association (Cramer’s V / Chi-square)
- Mutual information scores
- Transformation suggestions: e.g. log transforms for skew, encoding recommendations
Visual AI Assistant
For exploration, the fastest interface is often language.
- Natural language visualisations: ask “show me sales by region” and get interactive charts
- Interactive Plotly outputs: zoom/pan + export as PNG/SVG
Pro Features
When workflows go beyond a single file, Datter scales up with connectors, collaboration, dashboards, and automation.
Data Connectors
- Databases: PostgreSQL + MySQL (SSL/TLS), table browser, schema inspection
- Cloud storage: AWS S3 (IAM role assumption), GCS, Azure Blob (SAS)
- REST APIs: pagination support + multiple auth methods (Basic/Bearer/API key/OAuth2)
ML-Powered Insights
- Time series forecasting: Prophet + ARIMA/SARIMAX, confidence intervals, MAE/MAPE/RMSE
- Anomaly detection: Isolation Forest, DBSCAN
- Clustering: K-means + elbow method, hierarchical clustering + dendrograms, silhouette optimization
- Pattern detection: multicollinearity and non-linear relationship detection
Collaboration
- Annotations & comments: column-level + chart annotations, threads, reactions, resolution tracking
- Shared workspaces: sharing + permission levels (Viewer, Editor, Admin)
Dashboard Builder
- Drag-and-drop dashboards: responsive grid, resizable widgets
- Widgets: charts, stat cards, tables, text/markdown blocks
- Cross-filtering: interactive filtering across widgets
Automation
- Scheduled runs: cron schedules, timezone-aware, connected source sync
- Alerts: thresholds, anomalies, drift detection, notifications (email/Slack/webhook)
Enterprise Features
For regulated environments and larger orgs, Datter adds governance, pipelines, and deep integrations.
- Advanced connectors: Snowflake (OAuth/SSO), BigQuery, Redshift (IAM auth)
- Pipeline builder: visual steps, built-in transforms, custom SQL
- Version control: versioning, diffs, rollback, publish/draft workflows
- Execution tracking: per-step metrics, error policies (stop/skip/log), history + replay
- Governance: audit logging, SSO & RBAC, approval workflows
Lessons Learned
- Performance is product: If insight generation feels slow, users stop trusting it.
- Explainability wins: Surfacing an issue is only useful when the “why” is clear.
- Defaults matter more than settings: Great presets create momentum; customization should come later.
- Security isn’t a feature toggle: Privacy-by-default reduces risk and builds trust.
What’s Next?
Datter already covers the core EDA workflow, but there’s always room to push it further. Some areas I’m excited to expand:
- More guided “next question” suggestions (deeper insight chaining)
- Stronger dataset comparison and drift tracking
- More connector coverage and enterprise deployment options
- Richer collaboration workflows (review/approval around findings)
- Expanded export and reporting formats for different stakeholders
Final Thoughts
Building Datter has been a very deliberate mix of product thinking and engineering craft: designing an experience that feels simple, while building an engine that’s statistically serious. The end result is a tool that helps you move from “I have data” to “I understand it” much faster, and with fewer blind spots.
If you’re interested in collaborating, trying Datter in a team setting, or sharing feedback, feel free to reach out.