Comprehensive technical documentation for databases—relational, NoSQL, analytical, and cloud-managed services. This repository contains hands-on notes, design patterns, operational procedures, and troubleshooting checklists to help you master database concepts across different engines.
This deep dive documentation provides detailed technical information for database administrators, engineers, and architects who need in-depth knowledge beyond high-level overviews. Each section includes configuration examples, operational procedures, performance tuning guides, and real-world best practices.
For high-level overviews and deployment strategies, see the MySQL Mastery Series and other database blog series.
This repository is aligned around 10 database/storage types. Each type has a clear role; use the overview below and the full reference to choose the right one.
| # | Type | Why you use it (use cases) | In this repo |
|---|---|---|---|
| 1 | Relational | Transactions, ACID, complex SQL, reporting, data integrity | relational/ — MySQL, PostgreSQL |
| 2 | Document Store | Flexible schema, document-centric data, rapid iteration | document/mongodb |
| 3 | Key-Value | Fast lookups by key, high throughput, simple get/put | key-value/redis, key-value/aerospike |
| 4 | Wide-Column | Massive reads/writes, large partitions, scale-out | wide-column/ |
| 5 | Graph | Relationships, traversals, recommendations, fraud | graph/ |
| 6 | Time-Series | Timestamped data, metrics, IoT, retention/aggregation | time-series/ |
| 7 | Search Engine | Full-text search, facets, log/search analytics | search-engine/elasticsearch |
| 8 | In-Memory Cache | Sub-ms latency, reduce DB load, sessions, counters | cache/ → key-value/redis |
| 9 | Blob/Object Storage | Files, media, backups, data lakes | blob-object/ |
| 10 | Vector | Similarity search, semantic search, RAG, AI retrieval | vector/ |
- Relational — Transactional applications (banking, e-commerce, inventory); structured reporting and dashboards; complex queries with joins; data integrity and referential constraints; legacy/enterprise systems. SQLite: embedded, single-file, or local-first apps (mobile, edge).
- Document Store — Flexible or evolving schemas; document-centric workloads (catalogs, content, configs); rapid iteration without migrations; horizontal scaling; semi-structured data (forms, API payloads).
- Key-Value — Fast lookups by key (sessions, preferences, feature flags); high throughput and simple get/put; leaderboards, counters, rate limiters; serverless/auto-scaling (e.g. DynamoDB).
- Wide-Column — Massive write throughput (events, IoT, clickstreams); very large partitions; multi-datacenter replication; sparse columns; no single point of failure, linear scale-out.
- Graph — Relationship-heavy data (social, followers); recommendations and “similar to”; fraud/identity and connected-account analysis; knowledge graphs; network/dependency analysis.
- Time-Series — Metrics and monitoring; IoT/sensor telemetry; financial tick data; event streams and audits; efficient retention, downsampling, and time-range aggregation.
- Search Engine — Full-text and fuzzy search; faceted search and autocomplete; log and security analytics (SIEM); relevance tuning and highlighting.
- In-Memory Cache — Reduce latency and load on primary store; cache DB/API results; session and short-lived state; rate limiting and counters; pub/sub and lightweight queues.
- Blob/Object Storage — Media and static assets; backups and archives; data lakes; unstructured data at scale; high durability and optional versioning.
- Vector — Semantic search (by meaning, not just keywords); RAG and AI retrieval; “similar items”/recommendations; deduplication and clustering; image/audio similarity.
Comprehensive guide to relational database management systems (RDBMS), focusing on SQL fundamentals, schema design, performance optimization, and operational best practices.
Contents:
- Core concepts: ACID properties, normalization, schema design, constraints
- Querying: SQL basics, joins, aggregates, window functions
- Indexing: B-tree, covering indexes, composite keys, partial indexes
- Transactions & locking: isolation levels, deadlocks, contention
- HA/DR: replication, failover, backups, PITR, verification
- Performance: execution plans, vacuum/analyze, connection pooling
- MySQL-specific guide: Complete MySQL architecture, configuration, and operations
- PostgreSQL-specific guide: Advanced features and optimization
Key Files:
README.md- Relational database fundamentalsmysql/README.md- Complete MySQL technical deep dive (organized into focused topics)postgresql/README.md- Complete PostgreSQL technical deep dive (organized into focused topics)
The repo is organized by the 10 database/storage types. Each type has its own folder; deep dives for specific engines live under that type.
| Type | Folder | Databases (folder each; ✅ covered, 📁 planned) |
|---|---|---|
| 1. Relational | relational/ |
mysql/ ✅, postgresql/ ✅, oracle/, sql-server/, sqlite/ 📁 |
| 2. Document Store | document/ |
mongodb/ ✅, couchdb/, firestore/ 📁 |
| 3. Key-Value | key-value/ |
redis/ ✅, aerospike/ ✅, dynamodb/, memcached/ 📁 |
| 4. Wide-Column | wide-column/ |
cassandra/, hbase/, scylladb/, bigtable/ 📁 |
| 5. Graph | graph/ |
neo4j/, neptune/, arangodb/ 📁 |
| 6. Time-Series | time-series/ |
influxdb/, timescaledb/, prometheus/ 📁 |
| 7. Search Engine | search-engine/ |
elasticsearch/ ✅, solr/, meilisearch/ 📁 |
| 8. In-Memory Cache | cache/ |
redis ✅, memcached/, hazelcast/ 📁 |
| 9. Blob/Object | blob-object/ |
s3/, gcs/, azure-blob/, minio/ 📁 |
| 10. Vector | vector/ |
pinecone/, weaviate/, milvus/, pgvector/ 📁 |
Also: nosql/README.md — index and redirect to the type folders above.
Fundamental concepts that apply across all database systems, regardless of type or vendor.
Contents:
- Storage & indexing: row vs columnar, LSM vs B-tree, compression
- Consistency & replication: quorum, leader/follower, multi-leader, eventual
- Sharding & partitioning: keys, rebalancing, hotspots, locality
- Transactions & durability: WAL/redo logs, checkpoints, fsync strategies
- Backup & restore: full/incremental, PITR, validation, drills
- Performance: connection management, caching, query planning, latency SLIs
- Observability: logs, metrics, traces, slow-query analysis
Guide to managed database services across major cloud providers, covering provisioning, operations, migrations, and cost optimization.
Contents:
- AWS services: RDS, Aurora, DynamoDB, ElastiCache, DocumentDB, Neptune
- Google Cloud services: Cloud SQL, Spanner, Bigtable, Firestore, Memorystore
- Azure services: Azure SQL Database, Cosmos DB, Azure Database for MySQL/PostgreSQL, Azure Cache for Redis
- Provisioning: sizing, storage classes, HA/DR settings, parameter groups
- Networking & security: VPC, IAM/roles, encryption, secrets management
- Migrations: dump/restore, DMS/Dataflow/Datastream, cutover strategies
- Operations: backups/PITR, maintenance windows, upgrades, monitoring/alerts
- Cost: storage vs IOPS, autoscaling, reserved vs on-demand vs serverless
Use when: transactional data, ACID, complex queries, reporting, strict consistency.
MySQL — Deep Dive · Blog Series
PostgreSQL — Deep Dive
Also: Oracle, SQL Server, SQLite (embedded, local; deep dive planned)
Use when: flexible schema, document-centric data, rapid iteration, horizontal scaling.
MongoDB — Deep Dive · Blog Series
Also: CouchDB, Firestore
Use when: fast lookups by key, high throughput, simple get/put, no complex queries.
Redis — Deep Dive · Blog Series
Aerospike — Deep Dive · Blog Series
Also: DynamoDB, Memcached
Use when: massive-scale reads/writes, large partitions, multi-datacenter, sparse columns.
Covered in: wide-column/
Examples: Cassandra, HBase, ScyllaDB, Bigtable
Use when: relationship-heavy data, traversals, recommendations, fraud, knowledge graphs.
Covered in: graph/
Examples: Neo4j, Amazon Neptune, ArangoDB
Use when: timestamped data, metrics, IoT, retention, time-range queries.
Covered in: time-series/
Examples: InfluxDB, TimescaleDB, Prometheus
Use when: full-text search, facets, log/search analytics, relevance tuning.
Elasticsearch — Deep Dive · Blog Series
Also: Apache Solr, Meilisearch
Use when: sub-millisecond latency, reduce DB load, sessions, rate limiting, pub/sub.
Covered in: cache/ → key-value/redis
Also: Memcached, Hazelcast
Use when: unstructured files, media, backups, data lakes, durability at scale.
Covered in: blob-object/
Examples: Amazon S3, Google Cloud Storage, MinIO, Azure Blob
Use when: similarity search, semantic search, RAG, AI-powered retrieval.
Covered in: vector/
Examples: Pinecone, Weaviate, Milvus, pgvector
Detailed use cases per database type are in the Use cases by type section above. The Database Types & Use Cases table maps each type to deep dives in this repo.
- Relational — Transactions, ACID, joins, reporting, integrity
- Document — Flexible schema, document-centric, fast iteration
- Key-Value — Fast lookups by key, high throughput
- Wide-Column — Massive scale, large partitions
- Graph — Relationships, traversals, recommendations
- Time-Series — Metrics, IoT, time-range queries
- Search — Full-text search, facets, log analytics
- In-Memory Cache — Low latency, offload DB, sessions
- Blob/Object — Files, media, backups, data lakes
- Vector — Similarity search, semantic search, RAG
- Start with
concepts/to understand fundamental database concepts - Choose
relational/ornosql/based on your use case - Read the specific database guide (e.g.,
mysql/README.mdormongodb/README.md) - Refer to
cloud-managed/if using managed services
- Jump directly to specific database guides for advanced topics
- Use
concepts/as a reference for cross-cutting concerns - Refer to
cloud-managed/for cloud-specific optimizations - Use operational checklists for day-to-day tasks
- Review
concepts/for architectural patterns - Compare database options in respective sections
- Evaluate cloud-managed vs self-managed in
cloud-managed/ - Consider blog series for deployment strategies and decision frameworks
This deep dive documentation complements the comprehensive blog series:
- Hub: MySQL Mastery Series
- Deep Dive:
relational/mysql/README.md - Coverage: Strategic decisions, cloud-managed, self-managed, Docker, Kubernetes, performance optimization
- Hub: MongoDB Mastery Series
- Deep Dive:
document/mongodb/README.md - Coverage: Deployment strategies, optimization, operations
- Hub: Redis Mastery Series
- Deep Dive:
key-value/redis/README.md - Coverage: Caching strategies, data structures, performance
- Hub: Aerospike Mastery Series
- Deep Dive:
key-value/aerospike/README.md - Coverage: High-performance deployments, hybrid memory architecture
- Hub: Elasticsearch Deployment Guide
- Deep Dive:
search-engine/elasticsearch/README.md - Coverage: Complete deployment strategies from local to production
- Prefer concise, copy/paste-ready commands and checklists
- Call out trade-offs and defaults that commonly surprise people
- Keep examples minimal and runnable
- Include links to official documentation
- Reference blog series for deployment strategies
- Add diagrams when they clarify concepts
- MySQL 8.0 Reference Manual
- MongoDB Documentation
- Redis Documentation
- Aerospike Documentation
- Elasticsearch Documentation
- Relational vs NoSQL Databases
- MySQL Mastery Series
- MongoDB Mastery Series
- Redis Mastery Series
- Aerospike Mastery Series
This documentation is maintained alongside the blog series. For deployment strategies and decision frameworks, refer to the respective blog series. For detailed technical implementation, refer to the specific database guides in this repository.