Technical Design Doc: Sample

Here is a template of technical design doc that I use.

This technical design document outlines the architecture, components, algorithms, and technologies used in building this system.
Project Context
Term Details Notes
Problem Statement: Existing feature vendor shutting down Shutting down by ….
Goals & Objectives: Develop inhouse services for this feature
Non-Goals: Identify any related problems that this design intentionally does not address.
Platforms: Web, iOS, Android
Term Details Notes
Data sources CloudSQL PostgreSQL Existing ad data
Data storage PostgreSQL No new data storage needed for this project
Data Preprocessing batch processing
Data Pipeline N/A
Data Model N/A
Business Logic
Term Details Notes
Feature Logic Details about feature logic
Algorithms Content-based filtering
Model Training N/A
Evaluation manual and debug log to include query and results to evaluate accuracy by data team Checking accuracy of result
System Architecture
Term Details Notes
Data Ingestion Layer No new data ingestion needed, existing data ingested by indexer from database to Elasticsearch will be used in this feature
Storage Layer No new data storage needed as we plan to query Elasticsearch realtime
Logic Layer Logic will be structured in Elasticsearch query. and Executed by Golang service
API Layer API type: RESTful
Endpoints: A new GET endpoint to get a list of data by item ID
Endpoint mocks:
Authentication: no-auth, public
Technologies: Golang

API types: RESTful/grpc/GraphQL/web-socket/custom
Cache Layer N/A
User Interface Integration: Integration in web/iOS/Android
Feature: no new feature. follow existing UI
Deployment and Scalability
Term Details Notes
Deployment Service will be deployed in Kubernetes cluster
Scalability Additional load will be introduced in Elasticsearch, need to scale ES cluster accordingly if needed.
Golang Service needs to scale based on CPU/memory and need to monitor ommkill
Fault Tolerance
Term Details Notes
Data Replication: Elasticsearch: Ensure data availability and resilience against node failures.
Service Redundancy Elasticsearch: Check load balancing config to distribute traffic among redundant instances.
Failure Detection and Recovery Automated Recovery: Elasticsearch has the necessary mechanism for automated recovery for node failure, so as Kuberneties for pod failure.
Data Transaction Management N/A ACID properties for critical operations
Fallback Mechanisms No fallback but gracefully handle service degradation.
Circuit Breakers N/A Clients to implement exponential backoff and jitter for retry.
Temporarily halt requests for a failing service.
Chaos Testing
Term Details Notes
Data privacy no Personal Identifiable Information (PII) data should be exposed in ethe ndpoint response
Monitoring and Alerting
Term Details Notes
Health Monitoring Monitor system metrics, such as CPU utilization, memory usage, and network latency, to detect anomalies and potential failures.
Use tools like Prometheus and Grafana for real-time monitoring and visualization.
Alerting Set up alerting for Endpoint and Elasticsearch.