Here is a template of technical design doc that I use.
Introduction | ||
This technical design document outlines the architecture, components, algorithms, and technologies used in building this system. |
Project Context | ||
Term | Details | Notes |
Problem Statement: | Existing feature vendor shutting down | Shutting down by …. |
Goals & Objectives: | Develop inhouse services for this feature | |
Non-Goals: | Identify any related problems that this design intentionally does not address. | |
Platforms: | Web, iOS, Android |
Data | ||
Term | Details | Notes |
Data sources | CloudSQL PostgreSQL | Existing ad data |
Data storage | PostgreSQL | No new data storage needed for this project |
Data Preprocessing | batch processing | |
Data Pipeline | N/A | |
Data Model | N/A |
Business Logic | ||
Term | Details | Notes |
Feature Logic | Details about feature logic | |
Algorithms | Content-based filtering | |
Model Training | N/A | |
Evaluation | manual and debug log to include query and results to evaluate accuracy by data team | Checking accuracy of result |
System Architecture | ||
Term | Details | Notes |
Data Ingestion Layer | No new data ingestion needed, existing data ingested by indexer from database to Elasticsearch will be used in this feature | |
Storage Layer | No new data storage needed as we plan to query Elasticsearch realtime | |
Logic Layer | Logic will be structured in Elasticsearch query. and Executed by Golang service | |
API Layer |
API type: RESTful Endpoints: A new GET endpoint to get a list of data by item ID Endpoint mocks: TBD Authentication: no-auth, public Technologies: Golang |
API types: RESTful/grpc/GraphQL/web-socket/custom |
Cache Layer | N/A | |
User Interface |
Integration: Integration in web/iOS/Android Feature: no new feature. follow existing UI |
Deployment and Scalability | ||
Term | Details | Notes |
Deployment | Service will be deployed in Kubernetes cluster | |
Scalability | Additional load will be introduced in Elasticsearch, need to scale ES cluster accordingly if needed. Golang Service needs to scale based on CPU/memory and need to monitor ommkill |
|
Fault Tolerance | ||
Term | Details | Notes |
Data Replication: | Elasticsearch: Ensure data availability and resilience against node failures. | |
Service Redundancy | Elasticsearch: Check load balancing config to distribute traffic among redundant instances. | |
Failure Detection and Recovery | Automated Recovery: Elasticsearch has the necessary mechanism for automated recovery for node failure, so as Kuberneties for pod failure. | |
Data Transaction Management | N/A | ACID properties for critical operations |
Fallback Mechanisms | No fallback but gracefully handle service degradation. | |
Circuit Breakers | N/A | Clients to implement exponential backoff and jitter for retry. Temporarily halt requests for a failing service. |
Chaos Testing |
Security | ||
Term | Details | Notes |
Data privacy | no Personal Identifiable Information (PII) data should be exposed in ethe ndpoint response | |
Monitoring and Alerting | ||
Term | Details | Notes |
Health Monitoring | Monitor system metrics, such as CPU utilization, memory usage, and network latency, to detect anomalies and potential failures. Use tools like Prometheus and Grafana for real-time monitoring and visualization. |
|
Alerting | Set up alerting for Endpoint and Elasticsearch. |