Here is a template of technical design doc that I use.
| Introduction | ||
| This technical design document outlines the architecture, components, algorithms, and technologies used in building this system. |
| Project Context | ||
| Term | Details | Notes |
| Problem Statement: | Existing feature vendor shutting down | Shutting down by …. |
| Goals & Objectives: | Develop inhouse services for this feature | |
| Non-Goals: | Identify any related problems that this design intentionally does not address. | |
| Platforms: | Web, iOS, Android | |
| Data | ||
| Term | Details | Notes |
| Data sources | CloudSQL PostgreSQL | Existing ad data |
| Data storage | PostgreSQL | No new data storage needed for this project |
| Data Preprocessing | batch processing | |
| Data Pipeline | N/A | |
| Data Model | N/A | |
| Business Logic | ||
| Term | Details | Notes |
| Feature Logic | Details about feature logic | |
| Algorithms | Content-based filtering | |
| Model Training | N/A | |
| Evaluation | manual and debug log to include query and results to evaluate accuracy by data team | Checking accuracy of result |
| System Architecture | ||
| Term | Details | Notes |
| Data Ingestion Layer | No new data ingestion needed, existing data ingested by indexer from database to Elasticsearch will be used in this feature | |
| Storage Layer | No new data storage needed as we plan to query Elasticsearch realtime | |
| Logic Layer | Logic will be structured in Elasticsearch query. and Executed by Golang service | |
| API Layer |
API type: RESTful Endpoints: A new GET endpoint to get a list of data by item ID Endpoint mocks: TBD Authentication: no-auth, public Technologies: Golang |
API types: RESTful/grpc/GraphQL/web-socket/custom |
| Cache Layer | N/A | |
| User Interface |
Integration: Integration in web/iOS/Android Feature: no new feature. follow existing UI |
|
| Deployment and Scalability | ||
| Term | Details | Notes |
| Deployment | Service will be deployed in Kubernetes cluster | |
| Scalability | Additional load will be introduced in Elasticsearch, need to scale ES cluster accordingly if needed. Golang Service needs to scale based on CPU/memory and need to monitor ommkill |
|
| Fault Tolerance | ||
| Term | Details | Notes |
| Data Replication: | Elasticsearch: Ensure data availability and resilience against node failures. | |
| Service Redundancy | Elasticsearch: Check load balancing config to distribute traffic among redundant instances. | |
| Failure Detection and Recovery | Automated Recovery: Elasticsearch has the necessary mechanism for automated recovery for node failure, so as Kuberneties for pod failure. | |
| Data Transaction Management | N/A | ACID properties for critical operations |
| Fallback Mechanisms | No fallback but gracefully handle service degradation. | |
| Circuit Breakers | N/A | Clients to implement exponential backoff and jitter for retry. Temporarily halt requests for a failing service. |
| Chaos Testing | ||
| Security | ||
| Term | Details | Notes |
| Data privacy | no Personal Identifiable Information (PII) data should be exposed in ethe ndpoint response | |
| Monitoring and Alerting | ||
| Term | Details | Notes |
| Health Monitoring | Monitor system metrics, such as CPU utilization, memory usage, and network latency, to detect anomalies and potential failures. Use tools like Prometheus and Grafana for real-time monitoring and visualization. |
|
| Alerting | Set up alerting for Endpoint and Elasticsearch. | |