Technical Design Doc: Sample

Here is a template of technical design doc that I use.

Introduction
This technical design document outlines the architecture, components, algorithms, and technologies used in building this system.
Project Context
TermDetailsNotes
Problem Statement:Existing feature vendor shutting downShutting down by ….
Goals & Objectives:Develop inhouse services for this feature
Non-Goals:Identify any related problems that this design intentionally does not address.
Platforms:Web, iOS, Android
Data
TermDetailsNotes
Data sourcesCloudSQL PostgreSQLExisting ad data
Data storagePostgreSQLNo new data storage needed for this project
Data Preprocessingbatch processing
Data PipelineN/A
Data ModelN/A
Business Logic
TermDetailsNotes
Feature LogicDetails about feature logic
AlgorithmsContent-based filtering
Model TrainingN/A
Evaluationmanual and debug log to include query and results to evaluate accuracy by data teamChecking accuracy of result
System Architecture
TermDetailsNotes
Data Ingestion LayerNo new data ingestion needed, existing data ingested by indexer from database to Elasticsearch will be used in this feature
Storage LayerNo new data storage needed as we plan to query Elasticsearch realtime
Logic LayerLogic will be structured in Elasticsearch query. and Executed by Golang service
API LayerAPI type: RESTful
Endpoints: A new GET endpoint to get a list of data by item ID
Endpoint mocks:
TBD
Authentication: no-auth, public
Technologies: Golang

API types: RESTful/grpc/GraphQL/web-socket/custom
Cache LayerN/A
User InterfaceIntegration: Integration in web/iOS/Android
Feature: no new feature. follow existing UI
Deployment and Scalability
TermDetailsNotes
DeploymentService will be deployed in Kubernetes cluster
ScalabilityAdditional load will be introduced in Elasticsearch, need to scale ES cluster accordingly if needed.
Golang Service needs to scale based on CPU/memory and need to monitor ommkill
Fault Tolerance
TermDetailsNotes
Data Replication:Elasticsearch: Ensure data availability and resilience against node failures.
Service RedundancyElasticsearch: Check load balancing config to distribute traffic among redundant instances.
Failure Detection and RecoveryAutomated Recovery: Elasticsearch has the necessary mechanism for automated recovery for node failure, so as Kuberneties for pod failure.
Data Transaction ManagementN/AACID properties for critical operations
Fallback MechanismsNo fallback but gracefully handle service degradation.
Circuit BreakersN/AClients to implement exponential backoff and jitter for retry.
Temporarily halt requests for a failing service.
Chaos Testing
Security
TermDetailsNotes
Data privacyno Personal Identifiable Information (PII) data should be exposed in ethe ndpoint response
Monitoring and Alerting
TermDetailsNotes
Health MonitoringMonitor system metrics, such as CPU utilization, memory usage, and network latency, to detect anomalies and potential failures.
Use tools like Prometheus and Grafana for real-time monitoring and visualization.
AlertingSet up alerting for Endpoint and Elasticsearch.

Posted

in

by

Tags: