<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Engineer's Digest: ML & AI Systems Design]]></title><description><![CDATA[Here we are dissecting the ML and AI systems and engineering]]></description><link>https://harshuljain.substack.com/s/ml-ai-system-design</link><image><url>https://substackcdn.com/image/fetch/$s_!Tssn!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a21435-8c5b-4975-9db7-20292a727543_1280x1280.png</url><title>The Engineer&apos;s Digest: ML &amp; AI Systems Design</title><link>https://harshuljain.substack.com/s/ml-ai-system-design</link></image><generator>Substack</generator><lastBuildDate>Mon, 22 Jun 2026 12:54:28 GMT</lastBuildDate><atom:link href="https://harshuljain.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Harshul Jain & Tanya Sah]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[harshuljain@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[harshuljain@substack.com]]></itunes:email><itunes:name><![CDATA[Harshul Jain]]></itunes:name></itunes:owner><itunes:author><![CDATA[Harshul Jain]]></itunes:author><googleplay:owner><![CDATA[harshuljain@substack.com]]></googleplay:owner><googleplay:email><![CDATA[harshuljain@substack.com]]></googleplay:email><googleplay:author><![CDATA[Harshul Jain]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[AI-Powered File Insights - Part 3 (Architecting AI Service Layer) ]]></title><description><![CDATA[The hardest part of building AI-powered file insights isn&#8217;t choosing GPT-4 or Claude, it&#8217;s architecting a service that handles multimodal inputs reliably, validates AI outputs rigorously, and scales cost-effectively.]]></description><link>https://harshuljain.substack.com/p/ai-powered-file-insights-part-3-architecting</link><guid isPermaLink="false">https://harshuljain.substack.com/p/ai-powered-file-insights-part-3-architecting</guid><dc:creator><![CDATA[Harshul Jain]]></dc:creator><pubDate>Sat, 06 Dec 2025 17:14:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rQnk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfa16c81-a33e-41d1-b42e-be16b44ae0b4_1024x559.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>The hardest part of building AI-powered file insights isn&#8217;t choosing GPT-4 or Claude, it&#8217;s architecting a service that handles multimodal inputs reliably, validates AI outputs rigorously, and scales cost-effectively. In this part 3 of the series, we will be exploring about how to build such a service. </em></p><h2>Introduction</h2><p>In Parts 1 and 2, we built the foundation: the Upload Layer that securely ingests files at scale, while the Processing Layer uses SQS queues and worker services to decouple uploads from processing. These layers handle the &#8220;plumbing&#8221;, moving files reliably from users to processing pipelines.</p><p>Now we reach the heart of the system which is <strong>the AI Service that transforms raw files into actionable insights.</strong> This is where architectural complexity peaks. Unlike the Upload and Processing layers, which follow well-established patterns (pre-signed URLs, message queues, workers), the AI Service must navigate murky territory: multimodal inputs, unpredictable AI outputs, rate limits, and cost optimization.</p><p>This part explores how to architect an AI service that&#8217;s production-ready and not just a wrapper around OpenAI&#8217;s API, but a resilient, cost-effective intelligence layer that handles the chaos of real-world AI integration.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rQnk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfa16c81-a33e-41d1-b42e-be16b44ae0b4_1024x559.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rQnk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfa16c81-a33e-41d1-b42e-be16b44ae0b4_1024x559.png 424w, https://substackcdn.com/image/fetch/$s_!rQnk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfa16c81-a33e-41d1-b42e-be16b44ae0b4_1024x559.png 848w, https://substackcdn.com/image/fetch/$s_!rQnk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfa16c81-a33e-41d1-b42e-be16b44ae0b4_1024x559.png 1272w, https://substackcdn.com/image/fetch/$s_!rQnk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfa16c81-a33e-41d1-b42e-be16b44ae0b4_1024x559.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rQnk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfa16c81-a33e-41d1-b42e-be16b44ae0b4_1024x559.png" width="582" height="317.712890625" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dfa16c81-a33e-41d1-b42e-be16b44ae0b4_1024x559.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:559,&quot;width&quot;:1024,&quot;resizeWidth&quot;:582,&quot;bytes&quot;:790870,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/180874538?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfa16c81-a33e-41d1-b42e-be16b44ae0b4_1024x559.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rQnk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfa16c81-a33e-41d1-b42e-be16b44ae0b4_1024x559.png 424w, https://substackcdn.com/image/fetch/$s_!rQnk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfa16c81-a33e-41d1-b42e-be16b44ae0b4_1024x559.png 848w, https://substackcdn.com/image/fetch/$s_!rQnk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfa16c81-a33e-41d1-b42e-be16b44ae0b4_1024x559.png 1272w, https://substackcdn.com/image/fetch/$s_!rQnk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfa16c81-a33e-41d1-b42e-be16b44ae0b4_1024x559.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>High-Level Architecture</h2><p>The AI Service follows an <strong>agent-based architecture</strong> where the Worker Service invokes an orchestrated pipeline of seven specialized components before reaching the core intelligence layer. This architecture wraps Google&#8217;s Agent Development Kit (ADK) with production-grade infrastructure for caching, validation, observability, and memory management.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wjix!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fd79ee6-5c5a-40ba-a64e-eb29e3f32fb6_5991x1462.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wjix!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fd79ee6-5c5a-40ba-a64e-eb29e3f32fb6_5991x1462.png 424w, https://substackcdn.com/image/fetch/$s_!wjix!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fd79ee6-5c5a-40ba-a64e-eb29e3f32fb6_5991x1462.png 848w, https://substackcdn.com/image/fetch/$s_!wjix!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fd79ee6-5c5a-40ba-a64e-eb29e3f32fb6_5991x1462.png 1272w, https://substackcdn.com/image/fetch/$s_!wjix!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fd79ee6-5c5a-40ba-a64e-eb29e3f32fb6_5991x1462.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wjix!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fd79ee6-5c5a-40ba-a64e-eb29e3f32fb6_5991x1462.png" width="1456" height="355" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fd79ee6-5c5a-40ba-a64e-eb29e3f32fb6_5991x1462.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:355,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:627572,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/180874538?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fd79ee6-5c5a-40ba-a64e-eb29e3f32fb6_5991x1462.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wjix!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fd79ee6-5c5a-40ba-a64e-eb29e3f32fb6_5991x1462.png 424w, https://substackcdn.com/image/fetch/$s_!wjix!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fd79ee6-5c5a-40ba-a64e-eb29e3f32fb6_5991x1462.png 848w, https://substackcdn.com/image/fetch/$s_!wjix!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fd79ee6-5c5a-40ba-a64e-eb29e3f32fb6_5991x1462.png 1272w, https://substackcdn.com/image/fetch/$s_!wjix!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fd79ee6-5c5a-40ba-a64e-eb29e3f32fb6_5991x1462.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>Components</strong>:</p><ol><li><p><strong>API Entry Point</strong> - Orchestrates request lifecycle and coordinates all services</p></li><li><p><strong>Prompt Management</strong> - Version-controlled, A/B-tested prompts from database</p></li><li><p><strong>Guardrails</strong> - Validates inputs for injection attacks and outputs for hallucinations</p></li><li><p><strong>Google ADK</strong> - Core intelligence layer with integrated context caching, reasoning, and tool orchestration</p></li><li><p><strong>Tool &amp; Artifact Service</strong> - File operations (read_file, extract_pdf_text, analyze_image)</p></li><li><p><strong>Memory &amp; Observability</strong> - Context persistence and cost/latency tracking</p></li></ol><p></p><h2>Component Deep Dive</h2><p>This section examines each component in the agent-based architecture, detailing their specific responsibilities, implementation patterns, and interaction flows. Each component includes a focused sequence diagram showing its role in the processing pipeline.</p><h3>1. API Entry Point</h3><p>Central orchestration hub that coordinates all downstream services and manages the request lifecycle</p><p>The API Entry Point serves as the single interface between the Worker Service and the entire AI intelligence pipeline. It orchestrates the flow through prompt management, guardrails, and the agent framework while handling errors gracefully and tracking request context throughout the pipeline.</p><p><strong>Key Functions</strong>:</p><ul><li><p><strong>Request Coordination</strong>: Routes requests through the pipeline in the correct order (prompts &#8594; guardrails &#8594; agent &#8594; guardrails &#8594; response)</p></li><li><p><strong>Context Enrichment</strong>: Transforms file paths into file URLs and adds extra context (metadata, processing options) before validation</p></li><li><p><strong>Error Handling</strong>: Catches and categorizes errors from downstream services, decides whether to retry or fail fast</p></li><li><p><strong>Context Management</strong>: Maintains request context (file_id, user_id, trace_id) as it flows through components</p></li><li><p><strong>Circuit Breaking</strong>: Protects downstream services by failing fast when they&#8217;re unhealthy</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mu2J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3068ee6-fe42-4bdd-9271-06313a22d3e3_2820x3444.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mu2J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3068ee6-fe42-4bdd-9271-06313a22d3e3_2820x3444.png 424w, https://substackcdn.com/image/fetch/$s_!Mu2J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3068ee6-fe42-4bdd-9271-06313a22d3e3_2820x3444.png 848w, https://substackcdn.com/image/fetch/$s_!Mu2J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3068ee6-fe42-4bdd-9271-06313a22d3e3_2820x3444.png 1272w, https://substackcdn.com/image/fetch/$s_!Mu2J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3068ee6-fe42-4bdd-9271-06313a22d3e3_2820x3444.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mu2J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3068ee6-fe42-4bdd-9271-06313a22d3e3_2820x3444.png" width="1456" height="1778" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3068ee6-fe42-4bdd-9271-06313a22d3e3_2820x3444.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1778,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:649200,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/180874538?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3068ee6-fe42-4bdd-9271-06313a22d3e3_2820x3444.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mu2J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3068ee6-fe42-4bdd-9271-06313a22d3e3_2820x3444.png 424w, https://substackcdn.com/image/fetch/$s_!Mu2J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3068ee6-fe42-4bdd-9271-06313a22d3e3_2820x3444.png 848w, https://substackcdn.com/image/fetch/$s_!Mu2J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3068ee6-fe42-4bdd-9271-06313a22d3e3_2820x3444.png 1272w, https://substackcdn.com/image/fetch/$s_!Mu2J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3068ee6-fe42-4bdd-9271-06313a22d3e3_2820x3444.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>2. Guardrails</h3><p>Two-way validation layer that checks inputs for prompt injection and malware, and validates outputs for hallucinations and schema compliance</p><p>Guardrails act as the security and quality gates for the AI pipeline. Input guardrails prevent malicious content from reaching expensive AI services and protect against prompt injection attacks. Output guardrails validate AI responses for correctness, hallucinations, and schema compliance before returning results to users.</p><p><strong>Key Functions</strong>:</p><ul><li><p><strong>Input Validation</strong>: Detects prompt injection, malware, PII leakage attempts</p></li><li><p><strong>Output Validation</strong>: Checks for hallucinations, schema compliance, confidence thresholds</p></li><li><p><strong>Policy Enforcement</strong>: Blocks requests that violate safety or compliance policies</p></li><li><p><strong>Fail-Fast Pattern</strong>: Rejects invalid requests before consuming AI credits</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xe27!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc358cbab-eeff-4229-8b43-c732cda9eeec_2030x3294.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xe27!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc358cbab-eeff-4229-8b43-c732cda9eeec_2030x3294.png 424w, https://substackcdn.com/image/fetch/$s_!xe27!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc358cbab-eeff-4229-8b43-c732cda9eeec_2030x3294.png 848w, https://substackcdn.com/image/fetch/$s_!xe27!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc358cbab-eeff-4229-8b43-c732cda9eeec_2030x3294.png 1272w, https://substackcdn.com/image/fetch/$s_!xe27!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc358cbab-eeff-4229-8b43-c732cda9eeec_2030x3294.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xe27!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc358cbab-eeff-4229-8b43-c732cda9eeec_2030x3294.png" width="1456" height="2363" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c358cbab-eeff-4229-8b43-c732cda9eeec_2030x3294.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2363,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:472693,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/180874538?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc358cbab-eeff-4229-8b43-c732cda9eeec_2030x3294.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xe27!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc358cbab-eeff-4229-8b43-c732cda9eeec_2030x3294.png 424w, https://substackcdn.com/image/fetch/$s_!xe27!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc358cbab-eeff-4229-8b43-c732cda9eeec_2030x3294.png 848w, https://substackcdn.com/image/fetch/$s_!xe27!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc358cbab-eeff-4229-8b43-c732cda9eeec_2030x3294.png 1272w, https://substackcdn.com/image/fetch/$s_!xe27!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc358cbab-eeff-4229-8b43-c732cda9eeec_2030x3294.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>3. Google Agent Framework (ADK)</h3><p><strong>Responsibility</strong>: The core intelligence orchestrator that reasons about tasks, invokes tools, manages context caching, and coordinates with LLMs through multi-step workflows</p><p>The Google Agent Development Kit (ADK) is the heart of the intelligence layer. Unlike simple API wrappers, ADK treats AI as an agentic system that can plan multi-step workflows, invoke tools dynamically, and maintain context across reasoning steps. Critically, ADK now includes <strong>integrated context caching</strong> that automatically caches prompts, system instructions, and file artifacts-eliminating 30-40% of redundant token processing costs.</p><p><strong>Key Capabilities</strong>:</p><ul><li><p><strong>Context Caching</strong>: Automatically caches static content (system prompts, file artifacts) for reuse across requests, reducing token costs by 30-40%</p></li><li><p><strong>Multi-Step Reasoning</strong>: Breaks complex tasks into subtasks (&#8221;extract text from PDF, then analyze sentiment&#8221;)</p></li><li><p><strong>Tool Orchestration</strong>: Dynamically invokes file readers, PDF extractors, image analyzers based on file type</p></li><li><p><strong>Context Management</strong>: Maintains conversation history and intermediate results across tool calls</p></li><li><p><strong>Error Recovery</strong>: Retries failed tool invocations with adjusted parameters</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y_Yi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33e86479-2a09-4163-9a19-06ceed588c93_3678x4821.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y_Yi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33e86479-2a09-4163-9a19-06ceed588c93_3678x4821.png 424w, https://substackcdn.com/image/fetch/$s_!Y_Yi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33e86479-2a09-4163-9a19-06ceed588c93_3678x4821.png 848w, https://substackcdn.com/image/fetch/$s_!Y_Yi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33e86479-2a09-4163-9a19-06ceed588c93_3678x4821.png 1272w, https://substackcdn.com/image/fetch/$s_!Y_Yi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33e86479-2a09-4163-9a19-06ceed588c93_3678x4821.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y_Yi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33e86479-2a09-4163-9a19-06ceed588c93_3678x4821.png" width="1456" height="1908" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/33e86479-2a09-4163-9a19-06ceed588c93_3678x4821.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1908,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:969320,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/180874538?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33e86479-2a09-4163-9a19-06ceed588c93_3678x4821.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y_Yi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33e86479-2a09-4163-9a19-06ceed588c93_3678x4821.png 424w, https://substackcdn.com/image/fetch/$s_!Y_Yi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33e86479-2a09-4163-9a19-06ceed588c93_3678x4821.png 848w, https://substackcdn.com/image/fetch/$s_!Y_Yi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33e86479-2a09-4163-9a19-06ceed588c93_3678x4821.png 1272w, https://substackcdn.com/image/fetch/$s_!Y_Yi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33e86479-2a09-4163-9a19-06ceed588c93_3678x4821.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h4><strong>How Context Caching Works</strong>:</h4><ol><li><p><strong>First Request</strong>: ADK sends full prompt + system instructions + file artifact to LLM. Google caches this context.</p></li><li><p><strong>Subsequent Requests</strong>: For identical system prompts/files, ADK references the cached context, sending only the new user query.</p></li><li><p><strong>Cost Savings</strong>: Cached tokens cost 10x less than fresh tokens ($0.01 vs $0.10 per 1M tokens).</p></li><li><p><strong>Automatic Management</strong>: ADK handles cache invalidation when prompts or tools change.</p></li></ol><h4><strong>Tool &amp; Artifact Service :</strong></h4><p>Provide file reading and processing capabilities that the agent can invoke (read_file, extract_pdf_text, analyze_image)</p><p>The Tool &amp; Artifact Service is the agent&#8217;s &#8220;hands&#8221;-it performs the actual file manipulation that the agent needs but cannot do directly. Tools are registered with the agent framework and invoked dynamically based on the agent&#8217;s reasoning. Each tool is designed to be idempotent and atomic.</p><h4><strong>Memory  :</strong></h4><p>Memory maintains context across sessions using vector databases and Redis, while Observability tracks costs, latency, and tool usage. ADK has the support for in memory MemoryService as well as external Memory service such as Vertext AI and postgres. </p><p>You can literally build your own memory layer.</p><h4><strong>Observability :</strong></h4><p>For observability and evals, one can decorate the google adk calls with Opik. </p><p></p><h2>&#128640; Implementation</h2><p><strong>The architecture in this series is now available as an open source framework.</strong> It implements all three layers-secure uploads, async processing, and AI integration with guardrails-ready to deploy. Check it out on <a href="https://github.com/harshuljain13/ship-ai-agents">Github</a>.</p><p><em>Built something with it? Share your story in comments -I&#8217;d love to hear about it.</em></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[AI-Powered File Insights - Part 2 (Processing Layer) ]]></title><description><![CDATA[A continuation to the multi-part deep dive into designing AI-powered file insight systems - this part is about processing layer]]></description><link>https://harshuljain.substack.com/p/ai-powered-file-insights-part-2-processing</link><guid isPermaLink="false">https://harshuljain.substack.com/p/ai-powered-file-insights-part-2-processing</guid><dc:creator><![CDATA[Harshul Jain]]></dc:creator><pubDate>Sat, 22 Nov 2025 14:30:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-hMg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a02ae26-efc6-4200-ac91-da0f1f969564_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Note: This is the continuation of a multipart series for the system design on AI-Powered file insights. This is the part 2 which is focussed around files processing layer and making it scalable.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://harshuljain.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://harshuljain.substack.com/subscribe?"><span>Subscribe now</span></a></p><h2><strong>Introduction</strong></h2><p>In <strong>Part 1</strong>, we built the foundation for a secure, scalable upload layer that handles 10,000 concurrent file uploads with pre-signed URLs, encrypted storage, and proper rate limiting. Files are safely sitting in S3, metadata is tracked in PostgreSQL, and users have received their upload confirmations. &#128079;</p><p>But now comes the hard part : <strong>turning those raw files into actionable AI insights</strong>.</p><p>Uploading a file is synchronous and straightforward but <strong>processing the files and fetching insights through AI is not.</strong> It&#8217;s variable (5-30 seconds), expensive, rate-limited, and dependent on external services that can fail. You can&#8217;t block your API server waiting for AI responses as your connection pool would exhaust under load. </p><p>This is where <strong>Layer 2: Processing Layer</strong> comes in. This layer transforms the system into an intelligent processing pipeline that queues thousands of jobs, scales workers independently, retries failed requests, and handles failures gracefully.</p><p>In this part, we&#8217;ll architect the <strong>asynchronous processing pipeline</strong>. We&#8217;ll explore message queues, worker deployment strategies, AI service integration, and error handling.</p><p>By the end, you&#8217;ll have a complete blueprint for <strong>Layer 2</strong>, the asynchronous processing engine that reliably transforms uploaded files into AI-generated insights at scale.</p><p>Let&#8217;s dive in. &#128640;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-hMg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a02ae26-efc6-4200-ac91-da0f1f969564_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-hMg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a02ae26-efc6-4200-ac91-da0f1f969564_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!-hMg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a02ae26-efc6-4200-ac91-da0f1f969564_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!-hMg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a02ae26-efc6-4200-ac91-da0f1f969564_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!-hMg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a02ae26-efc6-4200-ac91-da0f1f969564_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-hMg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a02ae26-efc6-4200-ac91-da0f1f969564_1024x1024.png" width="595" height="595" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a02ae26-efc6-4200-ac91-da0f1f969564_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:595,&quot;bytes&quot;:1391466,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/179551132?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a02ae26-efc6-4200-ac91-da0f1f969564_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-hMg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a02ae26-efc6-4200-ac91-da0f1f969564_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!-hMg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a02ae26-efc6-4200-ac91-da0f1f969564_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!-hMg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a02ae26-efc6-4200-ac91-da0f1f969564_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!-hMg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a02ae26-efc6-4200-ac91-da0f1f969564_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>High Level Overview</h2><p>The <strong>Processing Layer</strong> is the asynchronous engine that transforms uploaded files into AI-powered insights. Unlike the synchronous upload Layer, this layer handles <strong>variable processing times (5-30 seconds)</strong>, manages <strong>expensive AI API calls</strong>, and gracefully <strong>handles failures</strong> at scale.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6RmZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3835459-1b94-4eca-83f5-1d8279591153_7612x2896.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6RmZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3835459-1b94-4eca-83f5-1d8279591153_7612x2896.png 424w, https://substackcdn.com/image/fetch/$s_!6RmZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3835459-1b94-4eca-83f5-1d8279591153_7612x2896.png 848w, https://substackcdn.com/image/fetch/$s_!6RmZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3835459-1b94-4eca-83f5-1d8279591153_7612x2896.png 1272w, https://substackcdn.com/image/fetch/$s_!6RmZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3835459-1b94-4eca-83f5-1d8279591153_7612x2896.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6RmZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3835459-1b94-4eca-83f5-1d8279591153_7612x2896.png" width="1456" height="554" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b3835459-1b94-4eca-83f5-1d8279591153_7612x2896.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:554,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:905298,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/179551132?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3835459-1b94-4eca-83f5-1d8279591153_7612x2896.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6RmZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3835459-1b94-4eca-83f5-1d8279591153_7612x2896.png 424w, https://substackcdn.com/image/fetch/$s_!6RmZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3835459-1b94-4eca-83f5-1d8279591153_7612x2896.png 848w, https://substackcdn.com/image/fetch/$s_!6RmZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3835459-1b94-4eca-83f5-1d8279591153_7612x2896.png 1272w, https://substackcdn.com/image/fetch/$s_!6RmZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3835459-1b94-4eca-83f5-1d8279591153_7612x2896.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">High Level Overview - By the Engineer&#8217;s Digest </figcaption></figure></div><p></p><p><strong>Core Components</strong>:</p><ul><li><p><strong>Backend Layer (Service) :</strong> When a file is uploaded, the backend enqueues a job with metadata (<code>{file_id, user_id, s3_key}</code>) to the message queue for preparing the file insights.</p></li><li><p><strong>Message Queue (SQS/Kafka)</strong>: Acts as a buffer between the upload layer and workers. . The queue decouples traffic spikes from processing capacity, allowing the system to absorb bursts of 10,000 uploads while workers process them at their own pace.</p></li><li><p><strong>Dead-Letter Queue (DLQ)</strong>: Captures failed jobs after exhausting retries (typically 3 attempts). Failed jobs are isolated for manual investigation, preventing them from blocking the main queue. DLQ messages include error details, stack traces, and the original job payload for debugging.</p></li><li><p><strong>Worker Service (Lambda)</strong>: The brain of the processing layer. Workers poll the queue, fetch job metadata, generates the pre signed url and handles rate limiting, retries with exponential backoff, and timeout management. Workers auto-scale based on queue depth thereby spinning up when jobs pile up and scaling down during idle periods.</p></li><li><p><strong>AI Service Integration</strong>: The worker sends the S3 file reference (pre-signed URL) to the AI service, which reads the file directly from S3 and processes it as an artifact. </p></li><li><p><strong>Results Storage (PostgreSQL)</strong>: Stores AI-generated insights in a structured schema linked to the original file record. Results include extracted text, metadata, confidence scores, and processing timestamps. This enables fast retrieval, search, and audit trails.</p><p></p></li></ul><h2>Low level overview for each component</h2><p>In this section we will dissect mainly the <strong>Message Queue</strong> and the <strong>workers</strong> logic to understand how they fit in the bigger picture to generate AI insights. Ai Service will be covered in part 3.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zmmr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe664eb51-46bd-478b-a8ad-28545b815e9d_3040x2768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zmmr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe664eb51-46bd-478b-a8ad-28545b815e9d_3040x2768.png 424w, https://substackcdn.com/image/fetch/$s_!zmmr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe664eb51-46bd-478b-a8ad-28545b815e9d_3040x2768.png 848w, https://substackcdn.com/image/fetch/$s_!zmmr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe664eb51-46bd-478b-a8ad-28545b815e9d_3040x2768.png 1272w, https://substackcdn.com/image/fetch/$s_!zmmr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe664eb51-46bd-478b-a8ad-28545b815e9d_3040x2768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zmmr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe664eb51-46bd-478b-a8ad-28545b815e9d_3040x2768.png" width="1456" height="1326" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e664eb51-46bd-478b-a8ad-28545b815e9d_3040x2768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1326,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:464076,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/179551132?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe664eb51-46bd-478b-a8ad-28545b815e9d_3040x2768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zmmr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe664eb51-46bd-478b-a8ad-28545b815e9d_3040x2768.png 424w, https://substackcdn.com/image/fetch/$s_!zmmr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe664eb51-46bd-478b-a8ad-28545b815e9d_3040x2768.png 848w, https://substackcdn.com/image/fetch/$s_!zmmr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe664eb51-46bd-478b-a8ad-28545b815e9d_3040x2768.png 1272w, https://substackcdn.com/image/fetch/$s_!zmmr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe664eb51-46bd-478b-a8ad-28545b815e9d_3040x2768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Low level overview - By The Engineer&#8217;s Digest</figcaption></figure></div><p></p><h3>Message Queue</h3><p>When a file successfully uploads to S3, the backend immediately enqueues a processing job to the message queue. This job contains all necessary metadata for workers to process the file independently without querying the API server.</p><pre><code>{
  &#8220;job_id&#8221;: &#8220;uuid-v4&#8221;,
  &#8220;file_id&#8221;: &#8220;123e4567-e89b-12d3-a456-426614174000&#8221;,
  &#8220;user_id&#8221;: &#8220;user-789&#8221;,
  &#8220;s3_bucket&#8221;: &#8220;ai-uploads-prod&#8221;,
  &#8220;s3_key&#8221;: &#8220;uploads/user-789/file-123/document.pdf&#8221;,
  &#8220;file_type&#8221;: &#8220;application/pdf&#8221;,
  &#8220;file_size_bytes&#8221;: 2457600,
  &#8220;uploaded_at&#8221;: &#8220;2025-11-22T13:29:00Z&#8221;,
  &#8220;processing_options&#8221;: {
    &#8220;extract_text&#8221;: true,
    &#8220;analyze_sentiment&#8221;: false,
    &#8220;language&#8221;: &#8220;en&#8221;
  }
}</code></pre><p><strong>Why Use a Queue </strong>:</p><ul><li><p><strong>Traffic spike absorption</strong>: Queue buffers 10,000+ uploads during peak times while workers process at their own pace</p></li><li><p><strong>Independent scaling</strong>: Workers scale based on queue depth, not API server capacity</p></li><li><p><strong>Reliability</strong>: Jobs persist in the queue until successfully processed, with automatic retries on failure</p></li><li><p><strong>Observability</strong>: Queue depth metrics help identify bottlenecks and trigger auto-scalingbs persist in the queue until successfully processed, with automatic retries on failure</p></li><li><p><strong>Improve observability</strong>: Queue depth metrics help identify bottlenecks and trigger auto-scaling</p></li></ul><p></p><h3>Worker Process:</h3><p>The Worker Service orchestrates the complex flow between message queues, file storage, AI services, and databases. Here&#8217;s the complete processing pipeline:</p><p>The 9-Step Processing Pipeline : </p><pre><code>1. Poll SQS queue (long polling, 20s waits)
2. Download file from S3 to /tmp
3. Deep validation (magic bytes, malware scan, integrity check)
4. Preprocessing (resize images, convert PDFs, chunk large docs)
5. Call AI service with retry logic (exponential backoff)
6. Validate AI response schema
7. Store results in database
8. Cleanup (delete /tmp files, ACK message, update status)
9. Publish status update (WebSocket/polling)</code></pre><p><strong>Polling and Visibility</strong>: Workers use long polling (20s) to reduce costs. SQS sets a visibility timeout (5 min) to prevent duplicate processing. If a worker crashes, the message automatically returns to the queue.</p><p><strong>Deep Validation</strong>: Beyond upload-time checks, workers verify magic bytes (true file type), scan for malware (ClamAV), and validate integrity. This prevents malicious executables renamed as PDFs.</p><p><strong>AI Integration Patterns</strong>: Three options exist: (A) Pass pre-signed S3 URL to AI service (recommended - reduces memory), (B) Upload file bytes directly (more secure), (C) Use streaming SDK for large files.</p><p><strong>Retry Strategy</strong>: Exponential backoff with jitter for rate limits and timeouts. First retry after 1s, second after 2s, third after 4s. Jitter prevents thundering herd when many workers hit limits simultaneously.</p><p><strong>Error Handling</strong>: After max retries, SQS moves messages to Dead-Letter Queue (DLQ). Workers log errors with full context, update database status to <code>failed</code>, and trigger alerts for investigation.</p><p><strong>Cleanup Process</strong>: Workers delete local `/tmp` files, ACK the SQS message, update database status to completed, and publish status updates. The original S3 file remains for audit trails and reprocessing.</p><p>This pipeline handles the complexity of reliable AI processing at scale including validation, preprocessing, retry logic, and cleanup, while maintaining observability and fault tolerance.</p><p></p><h2><strong>Conclusion</strong></h2><p>The Processing Layer is the backbone of reliable AI file processing, transforming chaotic concurrent uploads into an ordered, scalable pipeline. Three architectural decisions define its success:</p><ul><li><p><strong>Queue-based decoupling</strong>: SQS buffers traffic spikes, enables independent scaling, and provides built-in retry logic&#8212;essential beyond 100 requests/minute.</p></li><li><p><strong>Worker reliability</strong>: The 9-step pipeline (poll &#8594; validate &#8594; preprocess &#8594; AI call &#8594; cleanup) handles the unglamorous but critical work that separates production systems from demos.</p></li><li><p><strong>Start serverless, scale smart</strong>: Lambda&#8217;s zero idle cost and instant scaling work perfectly until you hit 15-minute timeouts or 10GB files, then migrate to containers.</p></li></ul><p>This architecture proves that <em>great systems aren&#8217;t built on perfect components, but on reliable communication between imperfect ones.</em></p><p></p><h2>Coming Up in Part 3: AI Service Integration Deep Dive</h2><p>In the next part, we&#8217;ll explore the intelligence layer that transforms raw files into structured insights:</p><ul><li><p><strong>Prompt engineering strategies</strong>: How to construct prompts that consistently return valid, structured JSON</p></li><li><p><strong>Response validation patterns</strong>: Using Pydantic schemas and JSON validators to catch AI hallucinations before they reach users</p></li><li><p><strong>Cost optimization techniques</strong>: Token tracking, model selection strategies, and caching patterns that cut API costs by 60%</p></li><li><p><strong>Managed vs. self-hosted decision</strong>: The $3000/month tipping point where self-hosting becomes economical</p></li><li><p><strong>Rate limiting and backoff strategies</strong>: Handling OpenAI&#8217;s 3,500 req/min limits without thundering herd failures</p><p></p></li></ul><p><em>Stay tuned for <strong>Part 3 :</strong> <strong>Building the AI Service for preparing file insights</strong>, where we turn files into intelligence. &#128640; </em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://harshuljain.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://harshuljain.substack.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[AI-Powered File Insights - Part 1 (Handling File Uploads)]]></title><description><![CDATA[A multi-part deep dive into designing AI-powered file insight systems - starting with secure, scalable file uploads.]]></description><link>https://harshuljain.substack.com/p/ai-powered-file-insights-part-1-handling</link><guid isPermaLink="false">https://harshuljain.substack.com/p/ai-powered-file-insights-part-1-handling</guid><dc:creator><![CDATA[Harshul Jain]]></dc:creator><pubDate>Sat, 15 Nov 2025 14:30:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Und5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f424ea-3992-49ea-b0a1-8ecb99a13fb5_1024x559.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Note: This will be a multipart series for the system design on AI-Powered file insights. This is the part 1 which is focussed around handling file uploads in secure and scalable way to generate AI powered insights.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://harshuljain.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://harshuljain.substack.com/subscribe?"><span>Subscribe now</span></a></p><h2>Introduction</h2><p>Imagine building a system where users upload medical reports and instantly get diagnosis summaries, or financial teams drop invoices and automatically extract line items, or recruiters upload resumes and get candidate insights in seconds. <strong>AI has made extracting insights from unstructured files trivial but the hard part is building the production system around it.</strong></p><p>AI introduces a new architectural component with unique operational characteristics such as variable latency (5-30 seconds), external API dependencies, rate limits, per-request costs, and data residency concerns.</p><p>This isn&#8217;t just about calling <code>google_adk.analyze(file)</code>. It&#8217;s a <strong>system design problem</strong> that forces you to rethink traditional patterns:</p><ul><li><p>How do you handle 10,000 concurrent uploads without overwhelming AI rate limits ?</p></li><li><p>Should files go through your backend or directly to S3? What about validation and security ?</p></li><li><p>Synchronous processing or async queues? What happens when the AI service is down ? </p></li><li><p>How do you perform the backfills for missing data with all the GPU scarcity in place ?</p></li><li><p>Do you delete files after processing, or retain them for compliance and reprocessing ?</p></li><li><p>Should you use managed AI services or self-host models on GPUs ?</p></li></ul><p>This post architectures a production-grade system for AI-powered file insights, treating AI services as first-class distributed components and examining the design decisions that balance <strong>security, cost, scalability, and reliability</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Und5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f424ea-3992-49ea-b0a1-8ecb99a13fb5_1024x559.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Und5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f424ea-3992-49ea-b0a1-8ecb99a13fb5_1024x559.png 424w, https://substackcdn.com/image/fetch/$s_!Und5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f424ea-3992-49ea-b0a1-8ecb99a13fb5_1024x559.png 848w, https://substackcdn.com/image/fetch/$s_!Und5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f424ea-3992-49ea-b0a1-8ecb99a13fb5_1024x559.png 1272w, https://substackcdn.com/image/fetch/$s_!Und5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f424ea-3992-49ea-b0a1-8ecb99a13fb5_1024x559.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Und5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f424ea-3992-49ea-b0a1-8ecb99a13fb5_1024x559.png" width="1024" height="559" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c4f424ea-3992-49ea-b0a1-8ecb99a13fb5_1024x559.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:559,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:776096,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/178845874?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f424ea-3992-49ea-b0a1-8ecb99a13fb5_1024x559.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Und5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f424ea-3992-49ea-b0a1-8ecb99a13fb5_1024x559.png 424w, https://substackcdn.com/image/fetch/$s_!Und5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f424ea-3992-49ea-b0a1-8ecb99a13fb5_1024x559.png 848w, https://substackcdn.com/image/fetch/$s_!Und5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f424ea-3992-49ea-b0a1-8ecb99a13fb5_1024x559.png 1272w, https://substackcdn.com/image/fetch/$s_!Und5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4f424ea-3992-49ea-b0a1-8ecb99a13fb5_1024x559.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Problem Statement</h2><p>Design a system that allows users to securely upload files (documents, images, PDFs) and have them processed by an AI service to generate the insights, with the following requirements:</p><h4>Functional Requirements</h4><ol><li><p><strong>Upload</strong>: Users can upload files up to 50MB.</p></li><li><p><strong>Validation</strong>: System validates file type, size, and content before processing</p></li><li><p><strong>AI Processing</strong>: Files are sent to an AI service ( Wrappers on ADK, GPT-4 Vision, Claude) for analysis.</p></li><li><p><strong>Result Storage</strong>: AI-generated results are stored and retrievable.</p></li><li><p><strong>File Retention</strong>: Files are retained for audit and compliance purposes with configurable retention policies.</p></li><li><p><strong>Multi-user</strong>: Support concurrent uploads from thousands of users.</p></li></ol><h4>Non-Functional Requirements</h4><ol><li><p><strong>Security</strong>: Files must be encrypted in transit and at rest, with no unauthorized access.</p></li><li><p><strong>Privacy</strong>: Comply with GDPR/HIPAA, no data leakage to unauthorized parties.</p></li><li><p><strong>Performance</strong>: File upload and processing should complete within 30 seconds for 95th percentile.</p></li><li><p><strong>Scalability</strong>: Handle 10,000 concurrent uploads.</p></li><li><p><strong>Cost</strong>: Minimize storage and compute costs.</p></li><li><p><strong>Reliability</strong>: 99.9% uptime, with proper error handling and retries.</p><p></p></li></ol><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://harshuljain.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://harshuljain.substack.com/subscribe?"><span>Subscribe now</span></a></p><p></p><h2>System Architecture - Layer View</h2><p>The system can be organized into <strong>three architectural layers</strong>, each responsible for a distinct phase of the file processing lifecycle. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0IpJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe1a50-9387-4929-8c68-5e9cc9fa6711_4915x3704.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0IpJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe1a50-9387-4929-8c68-5e9cc9fa6711_4915x3704.png 424w, https://substackcdn.com/image/fetch/$s_!0IpJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe1a50-9387-4929-8c68-5e9cc9fa6711_4915x3704.png 848w, https://substackcdn.com/image/fetch/$s_!0IpJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe1a50-9387-4929-8c68-5e9cc9fa6711_4915x3704.png 1272w, https://substackcdn.com/image/fetch/$s_!0IpJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe1a50-9387-4929-8c68-5e9cc9fa6711_4915x3704.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0IpJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe1a50-9387-4929-8c68-5e9cc9fa6711_4915x3704.png" width="1456" height="1097" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/57fe1a50-9387-4929-8c68-5e9cc9fa6711_4915x3704.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1097,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:630923,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/178845874?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe1a50-9387-4929-8c68-5e9cc9fa6711_4915x3704.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0IpJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe1a50-9387-4929-8c68-5e9cc9fa6711_4915x3704.png 424w, https://substackcdn.com/image/fetch/$s_!0IpJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe1a50-9387-4929-8c68-5e9cc9fa6711_4915x3704.png 848w, https://substackcdn.com/image/fetch/$s_!0IpJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe1a50-9387-4929-8c68-5e9cc9fa6711_4915x3704.png 1272w, https://substackcdn.com/image/fetch/$s_!0IpJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe1a50-9387-4929-8c68-5e9cc9fa6711_4915x3704.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Layer overview - By The Engineer&#8217;s Digest</em></figcaption></figure></div><p>This layered architecture improves clarity and maintainability by separating concerns: upload handles secure file acceptance, processing manages AI analysis asynchronously, and the data layer handles storage and lifecycle management.</p><p>Below is a <strong>breakdown of each layer&#8217;s purpose</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8_yG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc8d8f9-b1fa-4a65-8a76-0d6401ec3826_2118x850.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8_yG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc8d8f9-b1fa-4a65-8a76-0d6401ec3826_2118x850.png 424w, https://substackcdn.com/image/fetch/$s_!8_yG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc8d8f9-b1fa-4a65-8a76-0d6401ec3826_2118x850.png 848w, https://substackcdn.com/image/fetch/$s_!8_yG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc8d8f9-b1fa-4a65-8a76-0d6401ec3826_2118x850.png 1272w, https://substackcdn.com/image/fetch/$s_!8_yG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc8d8f9-b1fa-4a65-8a76-0d6401ec3826_2118x850.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8_yG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc8d8f9-b1fa-4a65-8a76-0d6401ec3826_2118x850.png" width="1456" height="584" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dcc8d8f9-b1fa-4a65-8a76-0d6401ec3826_2118x850.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:584,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:319846,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/178845874?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc8d8f9-b1fa-4a65-8a76-0d6401ec3826_2118x850.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8_yG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc8d8f9-b1fa-4a65-8a76-0d6401ec3826_2118x850.png 424w, https://substackcdn.com/image/fetch/$s_!8_yG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc8d8f9-b1fa-4a65-8a76-0d6401ec3826_2118x850.png 848w, https://substackcdn.com/image/fetch/$s_!8_yG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc8d8f9-b1fa-4a65-8a76-0d6401ec3826_2118x850.png 1272w, https://substackcdn.com/image/fetch/$s_!8_yG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcc8d8f9-b1fa-4a65-8a76-0d6401ec3826_2118x850.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Description of different layers involved - By The Engineer&#8217;s Digest</em></figcaption></figure></div><p></p><h2>Layer 1: Upload Layer</h2><p>This layer acts as the entry point to the system, ensuring that only authenticated users can upload files, that files meet security and format requirements, and that they are stored in a secure, encrypted manner before being queued for AI processing. The upload layer is designed to handle high concurrency while minimizing backend load through direct client-to-storage uploads.</p><p>The table below compares <strong>two implementation approaches for this layer</strong>. We&#8217;ll use the <strong>pre-signed URL approach in further section</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L-n_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcca193cb-4ba4-4b4b-9da7-96a7cadd023e_2342x794.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L-n_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcca193cb-4ba4-4b4b-9da7-96a7cadd023e_2342x794.png 424w, https://substackcdn.com/image/fetch/$s_!L-n_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcca193cb-4ba4-4b4b-9da7-96a7cadd023e_2342x794.png 848w, https://substackcdn.com/image/fetch/$s_!L-n_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcca193cb-4ba4-4b4b-9da7-96a7cadd023e_2342x794.png 1272w, https://substackcdn.com/image/fetch/$s_!L-n_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcca193cb-4ba4-4b4b-9da7-96a7cadd023e_2342x794.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L-n_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcca193cb-4ba4-4b4b-9da7-96a7cadd023e_2342x794.png" width="1456" height="494" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cca193cb-4ba4-4b4b-9da7-96a7cadd023e_2342x794.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:494,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:357928,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/178845874?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcca193cb-4ba4-4b4b-9da7-96a7cadd023e_2342x794.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L-n_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcca193cb-4ba4-4b4b-9da7-96a7cadd023e_2342x794.png 424w, https://substackcdn.com/image/fetch/$s_!L-n_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcca193cb-4ba4-4b4b-9da7-96a7cadd023e_2342x794.png 848w, https://substackcdn.com/image/fetch/$s_!L-n_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcca193cb-4ba4-4b4b-9da7-96a7cadd023e_2342x794.png 1272w, https://substackcdn.com/image/fetch/$s_!L-n_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcca193cb-4ba4-4b4b-9da7-96a7cadd023e_2342x794.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Comparison of two implementation approaches for handling file uploads - By The Engineer&#8217;s Digest</em></figcaption></figure></div><p></p><h2>High Level Design Overview:</h2><p>Before diving into implementation details, let&#8217;s understand the key components and their interactions in the upload layer:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lpLO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f44cb53-824d-456a-8931-ad2a6901ceaf_4797x2092.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lpLO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f44cb53-824d-456a-8931-ad2a6901ceaf_4797x2092.png 424w, https://substackcdn.com/image/fetch/$s_!lpLO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f44cb53-824d-456a-8931-ad2a6901ceaf_4797x2092.png 848w, https://substackcdn.com/image/fetch/$s_!lpLO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f44cb53-824d-456a-8931-ad2a6901ceaf_4797x2092.png 1272w, https://substackcdn.com/image/fetch/$s_!lpLO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f44cb53-824d-456a-8931-ad2a6901ceaf_4797x2092.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lpLO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f44cb53-824d-456a-8931-ad2a6901ceaf_4797x2092.png" width="1456" height="635" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2f44cb53-824d-456a-8931-ad2a6901ceaf_4797x2092.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:635,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:617267,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/178845874?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f44cb53-824d-456a-8931-ad2a6901ceaf_4797x2092.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lpLO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f44cb53-824d-456a-8931-ad2a6901ceaf_4797x2092.png 424w, https://substackcdn.com/image/fetch/$s_!lpLO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f44cb53-824d-456a-8931-ad2a6901ceaf_4797x2092.png 848w, https://substackcdn.com/image/fetch/$s_!lpLO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f44cb53-824d-456a-8931-ad2a6901ceaf_4797x2092.png 1272w, https://substackcdn.com/image/fetch/$s_!lpLO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f44cb53-824d-456a-8931-ad2a6901ceaf_4797x2092.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>High level Component interaction - By The Engineer&#8217;s Digest</em></figcaption></figure></div><p>The upload layer consists of seven key components that work together to securely accept and store files:</p><ol><li><p>The <strong>Frontend Client</strong> (built with React, Vue, or as a mobile app) initiates the upload process by requesting pre-signed URLs, uploading files directly to S3, and notifying the backend when uploads complete.</p></li><li><p>The <strong>Load Balancer</strong> (AWS ALB or NGINX) sits at the entry point, distributing incoming requests across multiple backend instances while handling SSL termination and performing health checks to ensure system availability.</p></li><li><p>The <strong>API Gateway</strong> (Kong or AWS API Gateway) acts as the security layer, authenticating users via JWT tokens, enforcing rate limits (10 uploads per minute per user), and logging all requests for audit purposes.</p></li><li><p>The <strong>Backend API</strong> (FastAPI or Node.js) serves as the orchestration hub, generating time-limited pre-signed URLs with encryption requirements, validating file metadata, and storing upload records in the database.</p></li><li><p>The <strong>Database</strong> (DynamoDB/PostgresSQL) handles metadata storage, maintaining records of file information, tracking upload status throughout the lifecycle, and providing a complete audit trail of all operations.</p></li><li><p>Finally, <strong>Object Storage</strong> (AWS S3) provides the actual file storage, keeping uploaded files encrypted at rest and organized into separate buckets for different processing stages.</p><p></p></li></ol><h2>Component Deep-Dive</h2><p>This section examines each component in the upload layer, detailing their specific responsibilities, implementation patterns, and security considerations. We&#8217;ll explore how these components work together to create a robust, scalable file upload system.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BRx5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb30fbe93-1515-4fa2-85c6-0d317f9ba1df_2878x1439.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BRx5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb30fbe93-1515-4fa2-85c6-0d317f9ba1df_2878x1439.png 424w, https://substackcdn.com/image/fetch/$s_!BRx5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb30fbe93-1515-4fa2-85c6-0d317f9ba1df_2878x1439.png 848w, https://substackcdn.com/image/fetch/$s_!BRx5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb30fbe93-1515-4fa2-85c6-0d317f9ba1df_2878x1439.png 1272w, https://substackcdn.com/image/fetch/$s_!BRx5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb30fbe93-1515-4fa2-85c6-0d317f9ba1df_2878x1439.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BRx5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb30fbe93-1515-4fa2-85c6-0d317f9ba1df_2878x1439.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b30fbe93-1515-4fa2-85c6-0d317f9ba1df_2878x1439.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:365186,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/178845874?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb30fbe93-1515-4fa2-85c6-0d317f9ba1df_2878x1439.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BRx5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb30fbe93-1515-4fa2-85c6-0d317f9ba1df_2878x1439.png 424w, https://substackcdn.com/image/fetch/$s_!BRx5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb30fbe93-1515-4fa2-85c6-0d317f9ba1df_2878x1439.png 848w, https://substackcdn.com/image/fetch/$s_!BRx5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb30fbe93-1515-4fa2-85c6-0d317f9ba1df_2878x1439.png 1272w, https://substackcdn.com/image/fetch/$s_!BRx5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb30fbe93-1515-4fa2-85c6-0d317f9ba1df_2878x1439.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Sequence diagram for component interaction - By The Engineer&#8217;s Digest</em></figcaption></figure></div><h3><strong>A.</strong> <strong>Frontend Client (Web/Mobile Application)</strong></h3><p>The frontend client serves as the user-facing interface where file uploads begin. It handles initial validation, requests secure upload URLs from the backend, and manages the complete upload lifecycle from file selection to processing confirmation.</p><p><strong>Some Key Points</strong>:</p><ul><li><p><strong>File Selection &amp; Pre-Upload Validation</strong>: Validates file size (max 50MB) and type before requesting upload URLs to prevent unnecessary backend calls</p></li><li><p><strong>Pre-Signed URL Request</strong>: Sends file metadata (name, size, type) to backend via authenticated API call to receive temporary S3 upload URL</p></li><li><p><strong>Direct S3 Upload</strong>: Uploads file directly to S3 using pre-signed URL, bypassing backend to reduce server load and bandwidth costs</p></li><li><p><strong>Upload Completion Notification</strong>: Notifies backend after successful S3 upload to trigger AI processing pipeline and update file status</p></li></ul><pre><code>// Validation and upload flow
function validateFile(file) {
  const MAX_SIZE = 50 * 1024 * 1024; // 50MB
  const ALLOWED_TYPES = [&#8217;application/pdf&#8217;, &#8216;image/jpeg&#8217;, &#8216;image/png&#8217;];
  
  if (file.size &gt; MAX_SIZE || !ALLOWED_TYPES.includes(file.type)) {
    throw new Error(&#8217;Invalid file&#8217;);
  }
}

async function uploadFile(file) {
  // Get pre-signed URL
  const { upload_url, file_id } = await fetch(&#8217;/api/v1/files/upload-url&#8217;, {
    method: &#8216;POST&#8217;,
    headers: { &#8216;Authorization&#8217;: `Bearer ${jwtToken}` },
    body: JSON.stringify({ file_name: file.name, file_size: file.size, file_type: file.type })
  }).then(r =&gt; r.json());
  
  // Upload to S3
  await fetch(upload_url, { method: &#8216;PUT&#8217;, body: file });
  
  // Trigger processing
  await fetch(`/api/v1/files/${file_id}/process`, { 
    method: &#8216;POST&#8217;, 
    headers: { &#8216;Authorization&#8217;: `Bearer ${jwtToken}` } 
  });
  
  return file_id;
}</code></pre><h3><strong>B.</strong> <strong>Load Balancer (Application Load Balancer)</strong></h3><p>Load balancer helps to distribute incoming traffic across backend servers, ensure high availability through health monitoring, and handle SSL/TLS termination for secure communication</p><p><strong>Some Key Points</strong>:</p><ul><li><p><strong>Traffic Distribution</strong>: Routes requests across multiple backend instances using intelligent algorithms to prevent server overload</p></li><li><p><strong>Health Monitoring</strong>: Continuously checks backend server health and automatically removes unhealthy instances from rotation</p></li><li><p><strong>SSL/TLS Termination</strong>: Handles encryption/decryption at the edge, reducing computational burden on backend servers</p></li><li><p><strong>Connection Management</strong>: Maintains persistent connections and optimizes request routing for better performance</p></li></ul><p><strong>Algorithm Selection for Load balancer:</strong></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KF23!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb285533b-1676-48d8-88e6-5a481645393e_1192x232.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KF23!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb285533b-1676-48d8-88e6-5a481645393e_1192x232.png 424w, https://substackcdn.com/image/fetch/$s_!KF23!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb285533b-1676-48d8-88e6-5a481645393e_1192x232.png 848w, https://substackcdn.com/image/fetch/$s_!KF23!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb285533b-1676-48d8-88e6-5a481645393e_1192x232.png 1272w, https://substackcdn.com/image/fetch/$s_!KF23!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb285533b-1676-48d8-88e6-5a481645393e_1192x232.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KF23!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb285533b-1676-48d8-88e6-5a481645393e_1192x232.png" width="1192" height="232" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b285533b-1676-48d8-88e6-5a481645393e_1192x232.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:232,&quot;width&quot;:1192,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:67063,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/178845874?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb285533b-1676-48d8-88e6-5a481645393e_1192x232.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KF23!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb285533b-1676-48d8-88e6-5a481645393e_1192x232.png 424w, https://substackcdn.com/image/fetch/$s_!KF23!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb285533b-1676-48d8-88e6-5a481645393e_1192x232.png 848w, https://substackcdn.com/image/fetch/$s_!KF23!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb285533b-1676-48d8-88e6-5a481645393e_1192x232.png 1272w, https://substackcdn.com/image/fetch/$s_!KF23!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb285533b-1676-48d8-88e6-5a481645393e_1192x232.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><em>Load balancer algorithms - By The Engineer&#8217;s Digest</em></figcaption></figure></div><p><strong>Why are we selecting Least Outstanding Requests?</strong> This is because our system handles both fast requests (pre-signed URL generation ~50ms) and slower requests (processing triggers ~200ms). This algorithm prevents slow requests from blocking fast ones.</p><h3><strong>C.</strong> <strong>API Gateway + Authentication Layer</strong></h3><p>This is the first line of defense which is responsible to authenticate users, enforce rate limits, and protect backend infrastructure from unauthorized access.</p><p><strong>Some Key Points</strong>:</p><ul><li><p><strong>JWT Authentication</strong>: Validates bearer tokens to verify user identity and permissions before allowing any file operations</p></li><li><p><strong>Rate Limiting</strong>: Implements token bucket algorithm via Redis to prevent abuse by enforcing 10 uploads per user per minute</p></li><li><p><strong>DDoS Protection</strong>: Blocks malicious traffic and unauthorized requests before they consume backend resources</p></li><li><p><strong>Token Expiry Validation</strong>: Ensures expired JWTs are rejected to maintain security posture</p></li></ul><p><strong>Why This Matters</strong>: Prevents DDoS attacks and unauthorized uploads before they reach your backend, reducing costs and protecting infrastructure.</p><h3><strong>D.</strong> <strong>Backend API Server</strong></h3><p>The Backend API Server acts as the central orchestration layer that validates requests, generates pre-signed URLs and triggers processing for preparing AI insights.</p><p><strong>Some Key Points</strong>:</p><ul><li><p><strong>Request Validation</strong>: Ensures all incoming file metadata (name, size, type) meets system requirements before proceeding with upload flow</p></li><li><p><strong>Pre-Signed URL Generation</strong>: Creates cryptographically signed, time-bound URLs (15-minute expiry) that grant temporary S3 access with enforced encryption and content-type restrictions</p></li><li><p><strong>Metadata Management</strong>: Stores comprehensive file records in PostgreSQL including upload status, S3 location, timestamps, and user associations for complete lifecycle tracking</p></li><li><p><strong>Processing Orchestration</strong>: Triggers asynchronous AI processing workflows after upload completion and provides status endpoints for monitoring</p></li><li><p><strong>Security Enforcement</strong>: Implements multiple security layers including encryption requirements, size limits, type restrictions, and user-scoped object keys to prevent unauthorized access</p></li></ul><p><strong>Key APIs</strong>:</p><pre><code>POST /api/v1/files/upload-url
- Input: {file_name, file_size, file_type, user_id}
- Output: {upload_url, file_id, expires_in}

POST /api/v1/files/{file_id}/process
- Input: {file_id}
- Output: {job_id, status}

GET /api/v1/files/{file_id}/status
- Input: {file_id}
- Output: {status, result, error}</code></pre><h3><strong>E.</strong> <strong>Pre-Signed URL Generation - Deep Dive</strong></h3><p>Pre-signed URLs are time-limited, cryptographically signed URLs that grant temporary access to S3 objects.</p><p>A simple service logic could look like this :</p><pre><code>class FileUploadService:
    def __init__(self):
        self.s3_client = boto3.client(&#8217;s3&#8217;)
        self.bucket_name = &#8216;ai-uploads-prod&#8217;
    
    async def generate_upload_url(self, user_id: str, file_metadata: dict):
        &#8220;&#8221;&#8220;
        Generate pre-signed URL with multiple security layers:
        1. Time-bound: 15-minute expiry
        2. Scoped: Only PUT to specific object key
        3. Type-restricted: Enforces Content-Type
        4. Size-restricted: Max 50MB
        5. Encryption-enforced: Must use AES256
        &#8220;&#8221;&#8220;
        
        file_id = str(uuid.uuid4())
        object_key = f&#8221;uploads/{user_id}/{file_id}/{file_metadata[&#8217;file_name&#8217;]}&#8221;
        
        # Security conditions
        presigned_url = self.s3_client.generate_presigned_url(
            ClientMethod=&#8217;put_object&#8217;,
            Params={
                &#8216;Bucket&#8217;: self.bucket_name,
                &#8216;Key&#8217;: object_key,
                &#8216;ContentType&#8217;: file_metadata[&#8217;file_type&#8217;],
                &#8216;ServerSideEncryption&#8217;: &#8216;AES256&#8217;,  # Enforce encryption
                &#8216;Metadata&#8217;: {
                    &#8216;user-id&#8217;: user_id,
                    &#8216;upload-timestamp&#8217;: datetime.utcnow().isoformat()
                }
            },
            ExpiresIn=900,  # 15 minutes
            HttpMethod=&#8217;PUT&#8217;
        )
        
        # Store metadata in database
        await self.store_file_metadata(
            file_id=file_id,
            user_id=user_id,
            s3_key=object_key,
            status=&#8217;pending_upload&#8217;,
            url_expires_at=datetime.utcnow() + timedelta(seconds=900)
        )
        
        return {
            &#8220;upload_url&#8221;: presigned_url,
            &#8220;file_id&#8221;: file_id,
            &#8220;expires_in&#8221;: 900
        }</code></pre><p><strong>Why 15-Minute TTL for URL Expiry?</strong></p><ul><li><p><strong>Security</strong>: Minimizes risk if URL is leaked (attacker has limited time)</p></li><li><p><strong>User Experience</strong>: Sufficient for 50MB uploads even on slow connections (~2-5 minutes typical)</p></li><li><p><strong>Cost Optimization</strong>: Prevents stale URLs from uploading unwanted files</p></li></ul><p><strong>Encryption at Multiple Layers</strong>:</p><ul><li><p><strong>In Transit</strong>: HTTPS/TLS 1.3 during upload (enforced by pre-signed URL)</p></li><li><p><strong>At Rest</strong>: Server-Side Encryption (SSE-S3 or SSE-KMS for HIPAA compliance)</p></li><li><p><strong>Metadata</strong>: Database encryption for sensitive fields</p></li></ul><h3><strong>F. Database (PostgreSQL)</strong></h3><p>PostgreSQL serves as the central metadata store that tracks the complete lifecycle of every file upload, that is, from initial URL generation through processing completion. It enables audit trails, supports concurrent access through connection pooling, and provides indexed queries for fast lookups by user, status, and timestamp.</p><p><strong>Some Key Points</strong>:</p><ul><li><p><strong>File Metadata Storage</strong>: Maintains comprehensive records including file details (name, size, type), S3 location, and user associations for complete file tracking</p></li><li><p><strong>Lifecycle Tracking</strong>: Monitors upload status progression through states (pending_upload &#8594; uploaded &#8594; queued &#8594; processing &#8594; completed &#8594; failed) with timestamps at each stage</p></li><li><p><strong>Audit Trail</strong>: Records creation and update timestamps for compliance and debugging, enabling full traceability of file operations</p></li><li><p><strong>High Concurrency Support</strong>: Uses connection pooling with 20 base connections and 10 overflow connections to handle concurrent requests efficiently</p></li><li><p><strong>Optimized Queries</strong>: Implements indexes on user_id, status, and created_at fields to enable fast lookups and filtering operations</p></li></ul><p><strong>Schema Diagram:</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VkY8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe349b98e-3850-49b9-a8ab-7114b4778c1c_914x786.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VkY8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe349b98e-3850-49b9-a8ab-7114b4778c1c_914x786.png 424w, https://substackcdn.com/image/fetch/$s_!VkY8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe349b98e-3850-49b9-a8ab-7114b4778c1c_914x786.png 848w, https://substackcdn.com/image/fetch/$s_!VkY8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe349b98e-3850-49b9-a8ab-7114b4778c1c_914x786.png 1272w, https://substackcdn.com/image/fetch/$s_!VkY8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe349b98e-3850-49b9-a8ab-7114b4778c1c_914x786.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VkY8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe349b98e-3850-49b9-a8ab-7114b4778c1c_914x786.png" width="414" height="356.0218818380744" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e349b98e-3850-49b9-a8ab-7114b4778c1c_914x786.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:786,&quot;width&quot;:914,&quot;resizeWidth&quot;:414,&quot;bytes&quot;:91950,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/178845874?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe349b98e-3850-49b9-a8ab-7114b4778c1c_914x786.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VkY8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe349b98e-3850-49b9-a8ab-7114b4778c1c_914x786.png 424w, https://substackcdn.com/image/fetch/$s_!VkY8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe349b98e-3850-49b9-a8ab-7114b4778c1c_914x786.png 848w, https://substackcdn.com/image/fetch/$s_!VkY8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe349b98e-3850-49b9-a8ab-7114b4778c1c_914x786.png 1272w, https://substackcdn.com/image/fetch/$s_!VkY8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe349b98e-3850-49b9-a8ab-7114b4778c1c_914x786.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Schema diagram for the file metadata - By The Engineer&#8217;s Digest</em></figcaption></figure></div><p><strong>Indexes: </strong>We will need few indices to speed up the query performance on the postgres. Some of these could be as below, if needed can be composite index too : </p><ul><li><p><code>idx_user_id</code> on user_id - for filtering by user</p></li><li><p><code>idx_status</code> on status - for filtering by upload state</p></li><li><p><code>idx_created_at</code> on created_at (DESC) - for chronological queries</p></li></ul><p><strong>Connection Pooling : </strong>Connection pooling maintains a pool of reusable database connections, avoiding the overhead of creating new connections for each request.</p><ul><li><p><strong>Prevents Connection Exhaustion</strong>: Without pooling, each request creates a new database connection, which is expensive (handshake overhead ~10-50ms). Pooling reuses existing connections, reducing latency.</p></li><li><p><strong>Handles Concurrent Requests</strong>: During traffic spikes (e.g., 100 simultaneous uploads), pooling ensures connections are efficiently shared rather than creating 100+ database connections that could overwhelm PostgreSQL.</p></li><li><p><strong>Database Protection</strong>: PostgreSQL has a maximum connection limit (default 100). Connection pooling enforces a cap per application instance, preventing one service from consuming all database connections.</p></li><li><p><strong>Resource Efficiency</strong>: Idle connections are kept warm and reused, avoiding the overhead of repeatedly opening/closing connections for each request.</p></li><li><p><strong>Graceful Degradation</strong>: When all connections are busy, new requests wait in queue (pool_timeout=30s) rather than failing immediately, improving reliability during load spikes.</p><p></p></li></ul><h3><strong>G. Object Storage (S3)</strong></h3><p>S3 serves as the secure, encrypted storage layer with strict access controls and compliance-ready configuration</p><p><strong>Some Key Points</strong>:</p><ul><li><p><strong>Public Access Prevention</strong>: All public access blocked at bucket level</p></li><li><p><strong>Encryption at Rest</strong>: SSE-S3 or SSE-KMS for HIPAA compliance</p></li><li><p><strong>Version Control</strong>: Versioning enabled for complete audit trail</p></li><li><p><strong>Access Control</strong>: IAM roles restrict read access to Worker Service only</p></li></ul><p><strong>Design Decision: Bucket Organization Strategy</strong></p><p>There are multiple ways to organize the bucket while storing the files. On a high level there could be two approaches here:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EHhO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5db83a-41cb-4277-b804-ad6b8aaf59d7_1151x431.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EHhO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5db83a-41cb-4277-b804-ad6b8aaf59d7_1151x431.png 424w, https://substackcdn.com/image/fetch/$s_!EHhO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5db83a-41cb-4277-b804-ad6b8aaf59d7_1151x431.png 848w, https://substackcdn.com/image/fetch/$s_!EHhO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5db83a-41cb-4277-b804-ad6b8aaf59d7_1151x431.png 1272w, https://substackcdn.com/image/fetch/$s_!EHhO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5db83a-41cb-4277-b804-ad6b8aaf59d7_1151x431.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EHhO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5db83a-41cb-4277-b804-ad6b8aaf59d7_1151x431.png" width="1151" height="431" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f5db83a-41cb-4277-b804-ad6b8aaf59d7_1151x431.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:431,&quot;width&quot;:1151,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:130300,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/178845874?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5db83a-41cb-4277-b804-ad6b8aaf59d7_1151x431.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EHhO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5db83a-41cb-4277-b804-ad6b8aaf59d7_1151x431.png 424w, https://substackcdn.com/image/fetch/$s_!EHhO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5db83a-41cb-4277-b804-ad6b8aaf59d7_1151x431.png 848w, https://substackcdn.com/image/fetch/$s_!EHhO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5db83a-41cb-4277-b804-ad6b8aaf59d7_1151x431.png 1272w, https://substackcdn.com/image/fetch/$s_!EHhO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f5db83a-41cb-4277-b804-ad6b8aaf59d7_1151x431.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>S3 bucket organization - By The Engineer&#8217;s Digest</em></figcaption></figure></div><p>As we can see, we need to setup buckets for different stages and store the uploads data for the user in a upload directory. Also security benefits outweigh the operational complexity.</p><h2>Conclusion</h2><p>In this first part of our AI-Powered File Insights series, we&#8217;ve explored the foundational <strong>Layer 1: Handling Uploads</strong>, the critical infrastructure that handles secure file uploads at scale. We&#8217;ve covered:</p><ul><li><p><strong>Client-side validation and chunking</strong> for large files</p></li><li><p><strong>Backend architecture</strong> with pre-signed URLs for direct-to-S3 uploads</p></li><li><p><strong>Database schema design</strong> with proper indexing and connection pooling</p></li><li><p><strong>S3 bucket security</strong> with encryption, versioning, and least-privilege IAM policies</p></li><li><p><strong>Rate limiting and abuse prevention</strong> to protect against malicious actors</p></li></ul><p>But uploading files is only half the story. The real magic happens in <strong>Layer 2: Processing Layer</strong>, where we transform raw files into actionable AI-powered insights.</p><p>In the next post, we&#8217;ll dive deep into:</p><ul><li><p><strong>Message queue architecture</strong> : why async processing beats synchronous, and how to choose between SQS, Kafka, and RabbitMQ</p></li><li><p><strong>Worker service design</strong> : Lambda vs. Kubernetes, handling long-running AI calls, and implementing robust retry mechanisms</p></li><li><p><strong>AI service integration patterns</strong> : managed services (OpenAI, Claude) vs. self-hosted models, cost optimization, and rate limit handling</p></li><li><p><strong>Results storage and retrieval</strong> : schema design for AI outputs, search capabilities, and real-time status updates</p></li><li><p><strong>Error handling and observability</strong> : dead-letter queues, monitoring queue depth, and debugging failed processing jobs</p></li></ul><p>Stay tuned for <strong>Part 2: Building the AI Processing Pipeline</strong>, where we&#8217;ll complete the journey from uploaded file to extracted insights! &#128640;</p><div><hr></div><p><em>&#128073; Follow along as we build production-ready AI systems. Next up: asynchronous processing, worker orchestration, and turning files into intelligence.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://harshuljain.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://harshuljain.substack.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[How does LLM streaming works]]></title><description><![CDATA[A behind-the-scenes look at how large language models and backend systems deliver responses token by token&#8212;creating real-time, conversational experiences.]]></description><link>https://harshuljain.substack.com/p/how-does-llm-streaming-works</link><guid isPermaLink="false">https://harshuljain.substack.com/p/how-does-llm-streaming-works</guid><dc:creator><![CDATA[Harshul Jain]]></dc:creator><pubDate>Sat, 07 Jun 2025 13:01:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YBVo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7f2a2d8-ee91-4ebd-b24d-8e2943bba41f_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><strong>What is LLM Streaming?</strong></h2><p>LLM streaming is a technique that enables incremental reception of data as it's generated by a large language model, rather than waiting for the entire response to be completed before sending it to the client.</p><p>Think of it like watching a live broadcast versus waiting for a pre-recorded message - you see the content as it's being created.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YBVo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7f2a2d8-ee91-4ebd-b24d-8e2943bba41f_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YBVo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7f2a2d8-ee91-4ebd-b24d-8e2943bba41f_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!YBVo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7f2a2d8-ee91-4ebd-b24d-8e2943bba41f_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!YBVo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7f2a2d8-ee91-4ebd-b24d-8e2943bba41f_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!YBVo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7f2a2d8-ee91-4ebd-b24d-8e2943bba41f_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YBVo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7f2a2d8-ee91-4ebd-b24d-8e2943bba41f_1024x1024.png" width="496" height="496" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7f2a2d8-ee91-4ebd-b24d-8e2943bba41f_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:496,&quot;bytes&quot;:1234749,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/165399539?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7f2a2d8-ee91-4ebd-b24d-8e2943bba41f_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YBVo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7f2a2d8-ee91-4ebd-b24d-8e2943bba41f_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!YBVo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7f2a2d8-ee91-4ebd-b24d-8e2943bba41f_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!YBVo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7f2a2d8-ee91-4ebd-b24d-8e2943bba41f_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!YBVo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7f2a2d8-ee91-4ebd-b24d-8e2943bba41f_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://harshuljain.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://harshuljain.substack.com/subscribe?"><span>Subscribe now</span></a></p><p></p><h5><strong>Real-World Example: ChatGPT</strong></h5><p>ChatGPT is a prime example of LLM streaming in action, where users can watch responses being typed out word by word, creating a more interactive and engaging experience. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BtyO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93bde866-ca08-4f53-8a92-e6709c140bec_600x279.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BtyO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93bde866-ca08-4f53-8a92-e6709c140bec_600x279.gif 424w, https://substackcdn.com/image/fetch/$s_!BtyO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93bde866-ca08-4f53-8a92-e6709c140bec_600x279.gif 848w, https://substackcdn.com/image/fetch/$s_!BtyO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93bde866-ca08-4f53-8a92-e6709c140bec_600x279.gif 1272w, https://substackcdn.com/image/fetch/$s_!BtyO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93bde866-ca08-4f53-8a92-e6709c140bec_600x279.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BtyO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93bde866-ca08-4f53-8a92-e6709c140bec_600x279.gif" width="600" height="279" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/93bde866-ca08-4f53-8a92-e6709c140bec_600x279.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:279,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:191984,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/165399539?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93bde866-ca08-4f53-8a92-e6709c140bec_600x279.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BtyO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93bde866-ca08-4f53-8a92-e6709c140bec_600x279.gif 424w, https://substackcdn.com/image/fetch/$s_!BtyO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93bde866-ca08-4f53-8a92-e6709c140bec_600x279.gif 848w, https://substackcdn.com/image/fetch/$s_!BtyO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93bde866-ca08-4f53-8a92-e6709c140bec_600x279.gif 1272w, https://substackcdn.com/image/fetch/$s_!BtyO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93bde866-ca08-4f53-8a92-e6709c140bec_600x279.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: https://www.vellum.ai/llm-parameters/llm-streaming</figcaption></figure></div><h2><strong>Why is Streaming Important?</strong></h2><p>Streaming significantly enhances user experience by:</p><ul><li><p>Reducing perceived wait times for next tokens from LLM.</p></li><li><p>Allowing users to see responses being generated in real-time.</p></li><li><p>Enabling early interruption if the AI isn't heading in the desired direction.</p></li><li><p>Optimizing costs by preventing unnecessary token generations.</p></li></ul><p></p><h2>The Dissection using the first principles</h2><p>To understand the LLM streaming, we need to dissect it at three levels:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cIN6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1222f9-b915-4b51-83ed-65b39afad03e_654x95.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cIN6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1222f9-b915-4b51-83ed-65b39afad03e_654x95.png 424w, https://substackcdn.com/image/fetch/$s_!cIN6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1222f9-b915-4b51-83ed-65b39afad03e_654x95.png 848w, https://substackcdn.com/image/fetch/$s_!cIN6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1222f9-b915-4b51-83ed-65b39afad03e_654x95.png 1272w, https://substackcdn.com/image/fetch/$s_!cIN6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1222f9-b915-4b51-83ed-65b39afad03e_654x95.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cIN6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1222f9-b915-4b51-83ed-65b39afad03e_654x95.png" width="654" height="95" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf1222f9-b915-4b51-83ed-65b39afad03e_654x95.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:95,&quot;width&quot;:654,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:10801,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/165399539?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1222f9-b915-4b51-83ed-65b39afad03e_654x95.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cIN6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1222f9-b915-4b51-83ed-65b39afad03e_654x95.png 424w, https://substackcdn.com/image/fetch/$s_!cIN6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1222f9-b915-4b51-83ed-65b39afad03e_654x95.png 848w, https://substackcdn.com/image/fetch/$s_!cIN6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1222f9-b915-4b51-83ed-65b39afad03e_654x95.png 1272w, https://substackcdn.com/image/fetch/$s_!cIN6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1222f9-b915-4b51-83ed-65b39afad03e_654x95.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><ul><li><p><strong>Frontend Perspective:</strong> How does ChatGPT UI stream content to user.</p></li><li><p><strong>Backend Perspective:</strong> How does  backend supports LLM streaming.</p></li><li><p><strong>AI Perspective:</strong> How does LLM supports token generation for LLM Streaming.</p></li></ul><p>Let&#8217;s dive in&#8230;</p><p></p><h3>How does backend support LLM streaming &#129300;</h3><p>When it comes to backend, everything boils down to the client-server architecture and we need to understand how content can be streamed at the the request response level. </p><p>There are three main approaches to implement streaming in a client-server architecture:</p><ol><li><p>Chunk Encoding</p></li><li><p>Server-Sent Events (SSE)</p></li><li><p>WebSockets</p></li></ol><p>To explore each approach in detail, we will try to understand the networking protocol, how FAST API server implements that protocol and how data flows back to react.</p><p></p><h4><strong>Understanding Chunk Encoding &#128640;</strong></h4><p>Imagine you're sending a very long letter, but instead of waiting to write the entire letter, you send it page by page as you write it. That's essentially what chunk encoding does in the digital world. It's a method introduced in HTTP/1.1 that allows data to be sent in pieces (chunks) rather than all at once.</p><p>Let's break down the process step by step:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MlmO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2f16eb-242f-4833-b101-2c4413276d7d_2370x1687.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MlmO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2f16eb-242f-4833-b101-2c4413276d7d_2370x1687.png 424w, https://substackcdn.com/image/fetch/$s_!MlmO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2f16eb-242f-4833-b101-2c4413276d7d_2370x1687.png 848w, https://substackcdn.com/image/fetch/$s_!MlmO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2f16eb-242f-4833-b101-2c4413276d7d_2370x1687.png 1272w, https://substackcdn.com/image/fetch/$s_!MlmO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2f16eb-242f-4833-b101-2c4413276d7d_2370x1687.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MlmO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2f16eb-242f-4833-b101-2c4413276d7d_2370x1687.png" width="1456" height="1036" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ad2f16eb-242f-4833-b101-2c4413276d7d_2370x1687.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1036,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:189374,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/165399539?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2f16eb-242f-4833-b101-2c4413276d7d_2370x1687.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MlmO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2f16eb-242f-4833-b101-2c4413276d7d_2370x1687.png 424w, https://substackcdn.com/image/fetch/$s_!MlmO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2f16eb-242f-4833-b101-2c4413276d7d_2370x1687.png 848w, https://substackcdn.com/image/fetch/$s_!MlmO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2f16eb-242f-4833-b101-2c4413276d7d_2370x1687.png 1272w, https://substackcdn.com/image/fetch/$s_!MlmO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2f16eb-242f-4833-b101-2c4413276d7d_2370x1687.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Initial Setup : </strong>The server (In the above case FAST API) first sends the below response headers to tell the client <em>"Hey, I'm going to send you data in chunks!"</em>. This helps make sure request from client does not gets marked complete on first chunk.</p><pre><code><code>HTTP/1.1 200 OK
content-type: text/plain; charset=utf-8
Transfer-Encoding: chunked </code></code></pre><p><strong>Chunk Format</strong> : Each chunk follows this structure:</p><pre><code>[Size of chunk in hexadecimal]\r\n
[Actual chunk data]\r\n

Example: 
11\r\n
Developer Network\r\n
7\r\n 
Mozilla\r\n</code></pre><p><strong>Fast API Implementation:</strong> </p><p>Fast API has a response type that takes care of setting up the above response header. This response type is called as <strong>StreamingResponse</strong><code>.</code></p><pre><code>from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import time

app = FastAPI()

def number_generator():
    for i in range(10):
        time.sleep(1)  # Simulate processing time
        yield f"Number {i}\n"

@app.get("/stream")
def stream_numbers():
    return StreamingResponse(number_generator(), media_type="text/plain")</code></pre><p><strong>Curl response :</strong></p><pre><code>curl http://localhost:8000/stream

Number 0
Number 1
Number 2
...
Number 9</code></pre><p><em>Do you know that, Sagemaker uses chunk encoding to expose streaming responses for ML endpoints. Check out this Blog: <a href="https://aws.amazon.com/blogs/machine-learning/elevating-the-generative-ai-experience-introducing-streaming-support-in-amazon-sagemaker-hosting/">https://aws.amazon.com/blogs/machine-learning/elevating-the-generative-ai-experience-introducing-streaming-support-in-amazon-sagemaker-hosting/ </a></em></p><p></p><h4><strong>Understanding Server Side Events (SSE) &#128640;</strong></h4><p>Server-Sent Events (SSE) is a technology that enables servers to push real-time updates to clients over a single, long-lived HTTP connection. Think of it as a one-way communication channel where the server can continuously broadcast information to the client.</p><p>Let's break down the process step by step:</p><p><strong>Client Request:</strong> Client first send the request saying that it is looking for the events at `/events` and is ready to accept the data as event-streams.</p><pre><code><code>// Client Request
GET /events HTTP/1.1
Accept: text/event-stream</code></code></pre><p><strong>Server Response :</strong> The server (In the above case FAST API) first sends the below response headers to tell the client <em>"Hey, I'll send you data as a stream of events"</em>. This helps make sure request from client does not gets marked complete on first event and makes it a long lived connection.</p><pre><code>
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive</code></pre><p><strong>Fast API implementation:</strong></p><pre><code>from fastapi import FastAPI
from sse_starlette.sse import EventSourceResponse
import asyncio

app = FastAPI()

@app.get("/stream")
async def stream():
    async def event_generator():
        while True:
            yield {"data": "Hello! This is a server update"}
            await asyncio.sleep(2)  # Send update every 2 seconds
    return EventSourceResponse(event_generator())</code></pre><p><strong>Curl Response:</strong></p><pre><code>curl -N -H "Accept: text/event-stream" http://localhost:8000/stream

data: Hello! This is a server update

data: Hello! This is a server update

data: Hello! This is a server update</code></pre><p></p><h4><strong>Understanding Websockets &#128640;</strong></h4><p>WebSocket is like having a phone conversation where both parties can speak at any time. It's more complex but offers greater flexibility. This is usually used in Chat Applications. </p><p></p><h4>Comparison of 3 approaches:</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FtO3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcb8da49-c416-4e1c-beb0-fd6f42c4a573_946x522.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FtO3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcb8da49-c416-4e1c-beb0-fd6f42c4a573_946x522.png 424w, https://substackcdn.com/image/fetch/$s_!FtO3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcb8da49-c416-4e1c-beb0-fd6f42c4a573_946x522.png 848w, https://substackcdn.com/image/fetch/$s_!FtO3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcb8da49-c416-4e1c-beb0-fd6f42c4a573_946x522.png 1272w, https://substackcdn.com/image/fetch/$s_!FtO3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcb8da49-c416-4e1c-beb0-fd6f42c4a573_946x522.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FtO3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcb8da49-c416-4e1c-beb0-fd6f42c4a573_946x522.png" width="946" height="522" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bcb8da49-c416-4e1c-beb0-fd6f42c4a573_946x522.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:522,&quot;width&quot;:946,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:208029,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/165399539?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcb8da49-c416-4e1c-beb0-fd6f42c4a573_946x522.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FtO3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcb8da49-c416-4e1c-beb0-fd6f42c4a573_946x522.png 424w, https://substackcdn.com/image/fetch/$s_!FtO3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcb8da49-c416-4e1c-beb0-fd6f42c4a573_946x522.png 848w, https://substackcdn.com/image/fetch/$s_!FtO3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcb8da49-c416-4e1c-beb0-fd6f42c4a573_946x522.png 1272w, https://substackcdn.com/image/fetch/$s_!FtO3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcb8da49-c416-4e1c-beb0-fd6f42c4a573_946x522.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3><strong>Dance of tokens: How LLMs stream their thoughts</strong> &#129300;</h3><p>When you see an AI chatbot responding to you word by word, what's happening behind the scenes is a fascinating interplay between the LLM's token generation and the backend's streaming capabilities. We have already covered how backend supports streaming capabilities above. </p><p>Let's understand this dance of tokens generated by LLM that makes real-time AI interactions possible.</p><p></p><h4><strong>The Token Generation Process</strong></h4><p>At its heart, an LLM thinks in tokens, not words. When you send a prompt, the model begins what's called autoregressive generation &#8211; a fancy term for predicting one token at a time, each prediction influenced by what came before it. Imagine someone writing a story, but instead of thinking about the whole sentence at once, they're deciding each word based on what they've written so far.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Sumy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae8fc9cb-429a-4ccf-aa02-6ff5b2acd75d_1432x501.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Sumy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae8fc9cb-429a-4ccf-aa02-6ff5b2acd75d_1432x501.gif 424w, https://substackcdn.com/image/fetch/$s_!Sumy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae8fc9cb-429a-4ccf-aa02-6ff5b2acd75d_1432x501.gif 848w, https://substackcdn.com/image/fetch/$s_!Sumy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae8fc9cb-429a-4ccf-aa02-6ff5b2acd75d_1432x501.gif 1272w, https://substackcdn.com/image/fetch/$s_!Sumy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae8fc9cb-429a-4ccf-aa02-6ff5b2acd75d_1432x501.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Sumy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae8fc9cb-429a-4ccf-aa02-6ff5b2acd75d_1432x501.gif" width="1432" height="501" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae8fc9cb-429a-4ccf-aa02-6ff5b2acd75d_1432x501.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:501,&quot;width&quot;:1432,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1325837,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://harshuljain.substack.com/i/165399539?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae8fc9cb-429a-4ccf-aa02-6ff5b2acd75d_1432x501.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Sumy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae8fc9cb-429a-4ccf-aa02-6ff5b2acd75d_1432x501.gif 424w, https://substackcdn.com/image/fetch/$s_!Sumy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae8fc9cb-429a-4ccf-aa02-6ff5b2acd75d_1432x501.gif 848w, https://substackcdn.com/image/fetch/$s_!Sumy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae8fc9cb-429a-4ccf-aa02-6ff5b2acd75d_1432x501.gif 1272w, https://substackcdn.com/image/fetch/$s_!Sumy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae8fc9cb-429a-4ccf-aa02-6ff5b2acd75d_1432x501.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: https://medium.com/@YanAIx/step-by-step-into-gpt-70bc4a5d8714</figcaption></figure></div><p>The process is sequential and methodical. The model looks at your input, generates a probability distribution over its entire vocabulary, and selects the most likely next token. This could be a full word like "hello" or just part of a word like "ing" or "pre". Each token is a building block that contributes to the final response.</p><p>These tokens are what streamed using APIs implemented by OpenAI, Anthropic AI etc. Players like OpenAI, Anthropic AI uses one of the 3 approaches shared above for their backend implementations to stream content. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://harshuljain.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://harshuljain.substack.com/subscribe?"><span>Subscribe now</span></a></p><p></p><h2>References: &#128214;</h2><p><a href="https://medium.com/@YanAIx/step-by-step-into-gpt-70bc4a5d8714">Diving deep into the LLM</a></p><p><a href="https://community.openai.com/t/streaming-completion-in-python/22227">Open AI Streaming completion in Python</a></p><p><a href="https://github.com/ggml-org/llama.cpp/blob/0974ad7a7cd4bca846b15c484ff3be890135a52c/examples/gritlm/gritlm.cpp#L120-L150">Possible Llama Opensource implementation for handling streaming of tokens</a></p><p></p><p></p><p></p>]]></content:encoded></item></channel></rss>