<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Sneh’s Substack]]></title><description><![CDATA[Sharing insights from my journey in machine learning, data engineering, and software development—covering real-world projects with Python, AWS, and modern frameworks. Expect stories from building ETL pipelines, training ML models, etc.]]></description><link>https://www.snehvora.me</link><image><url>https://substackcdn.com/image/fetch/$s_!9m8J!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9157932e-985e-4b66-8b94-e3258376ea5c_1280x1280.png</url><title>Sneh’s Substack</title><link>https://www.snehvora.me</link></image><generator>Substack</generator><lastBuildDate>Thu, 18 Jun 2026 08:07:22 GMT</lastBuildDate><atom:link href="https://www.snehvora.me/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Sneh Vora]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[snehvora@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[snehvora@substack.com]]></itunes:email><itunes:name><![CDATA[Sneh Vora]]></itunes:name></itunes:owner><itunes:author><![CDATA[Sneh Vora]]></itunes:author><googleplay:owner><![CDATA[snehvora@substack.com]]></googleplay:owner><googleplay:email><![CDATA[snehvora@substack.com]]></googleplay:email><googleplay:author><![CDATA[Sneh Vora]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Molina Healthcare]]></title><description><![CDATA[AI/ML Engineer &#8211; Gen AI]]></description><link>https://www.snehvora.me/p/molina-healthcare</link><guid isPermaLink="false">https://www.snehvora.me/p/molina-healthcare</guid><dc:creator><![CDATA[Sneh Vora]]></dc:creator><pubDate>Wed, 17 Jun 2026 15:34:54 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f55a80af-57eb-4021-a7ae-a184f98a1688_400x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I didn&#8217;t expect my first real GenAI project to scare me.</p><p>Not the technical parts &#8212; those were hard, but manageable. What scared me was the realization, about two months in, that a wrong answer from our system could delay a patient&#8217;s authorization. That a hallucinated policy reference could mean someone waiting longer than they should for care.</p><p>That changed everything about how I built.</p><div><hr></div><h2>The Problem We Were Actually Solving</h2><p>When I joined Molina Healthcare as an AI/ML Engineer in early 2025, the ask sounded clean: build a GenAI assistant to help staff search through healthcare policy documents.</p><p>But when I sat down with the people actually doing this work, the picture got sharper and messier.</p><p>A prior authorization specialist might need to cross-reference three different policy documents &#8212; coverage guidelines, authorization criteria, and clinical SOPs &#8212; just to answer one question. These weren&#8217;t short documents. We&#8217;re talking 8,000+ pages across hundreds of files, updated regularly, full of domain-specific language that doesn&#8217;t map cleanly to how anyone naturally asks a question.</p><p>They weren&#8217;t looking for a chatbot. They were looking for a trusted colleague who had read every policy ever written and could give them the right paragraph, right now.</p><p>That&#8217;s a very different thing to build.</p><div><hr></div><h2>What I Thought RAG Was vs. What It Actually Is</h2><p>I&#8217;d worked with RAG before. The concept is straightforward: retrieve relevant chunks from a document store, feed them to an LLM, get a grounded answer. Simple enough on paper.</p><p>What I hadn&#8217;t fully reckoned with was how much the quality of your retrieval determines the quality of everything downstream.</p><p>We ingested all 8,000+ pages using PyMuPDF &#8212; extracting text, cleaning formatting artifacts, splitting documents into chunks that were small enough to be precise but large enough to hold context. That chunking strategy alone took two weeks of iteration. Too small and you lose the surrounding context that makes a clause meaningful. Too large and you&#8217;re feeding the model noise.</p><p>Then we indexed everything into Azure AI Search and layered FAISS on top for embedding-based similarity scoring. We added metadata filters &#8212; document type, policy category, effective date &#8212; so retrieval wasn&#8217;t just semantic, it was structured.</p><p>That combination moved our retrieval accuracy by 22% in internal testing. But what the number doesn&#8217;t capture is why: it&#8217;s because we stopped treating retrieval as a search problem and started treating it as a comprehension problem. The system needed to understand what the staff member was really asking, not just match keywords.</p><div><hr></div><h2>The Lesson I Didn&#8217;t Expect: Guardrails Are the Product</h2><p>Early on, I thought of PHI masking and RBAC controls as compliance checkboxes. Things you add at the end so legal signs off.</p><p>I was wrong.</p><p>In a healthcare context, the guardrails are what make the product usable. A system that gives a fast, accurate answer but leaks protected health information in the process isn&#8217;t a product &#8212; it&#8217;s a liability. A system that can be queried by anyone regardless of role isn&#8217;t a tool &#8212; it&#8217;s a risk.</p><p>When we implemented PHI masking, few-shot prompting for consistency, and role-based access controls, we weren&#8217;t adding friction. We were building the foundation of trust that makes anyone willing to rely on the system in the first place.</p><p>The 30% reduction in manual lookup time is the metric I put on my resume. But what I&#8217;m actually proud of is that the team started using it without being told to &#8212; because they trusted it.</p><div><hr></div><h2>What I&#8217;d Tell Anyone Building AI in a Regulated Industry</h2><p><strong>Accuracy is the table stakes, not the goal.</strong> Every RAG system can surface relevant text. The question is whether it surfaces the <em>right</em> text with enough context for a human to act on it responsibly.</p><p><strong>Build with the person doing the job, not just for them.</strong> The specialists who used our system daily spotted retrieval failures I never would have caught in testing. Their feedback shaped the metadata filters, the prompt structure, and the confidence thresholds we used.</p><p><strong>A wrong answer at speed is worse than a slow right one.</strong> We added latency to certain query paths on purpose &#8212; to trigger human review for high-stakes authorization decisions. That was the right call.</p><div><hr></div><h2>Where This Is Going</h2><p>Healthcare AI is at an interesting inflection point. The technology is good enough to be genuinely useful. The harder problem is institutional trust &#8212; getting clinicians, administrators, and compliance teams to believe that an AI-assisted workflow is safer and more consistent than a manual one.</p><p>That trust isn&#8217;t built through demos. It&#8217;s built through guardrails, evaluation, transparency about what the system doesn&#8217;t know, and a long track record of getting it right.</p><p>I&#8217;m still building that track record. But I&#8217;m more convinced than ever that the engineers who will matter most in this space aren&#8217;t the ones who can build the fastest model &#8212; they&#8217;re the ones who understand what it means when that model is wrong.</p>]]></content:encoded></item><item><title><![CDATA[Accenture]]></title><description><![CDATA[Machine Learning Engineer]]></description><link>https://www.snehvora.me/p/accenture</link><guid isPermaLink="false">https://www.snehvora.me/p/accenture</guid><dc:creator><![CDATA[Sneh Vora]]></dc:creator><pubDate>Wed, 17 Jun 2026 15:31:21 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/b82eb617-e636-4da3-8139-835caa03292d_800x457.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The first time I looked at our confusion matrix, I thought we had a great model.</p><p>96% accuracy. Clean numbers. My manager nodded. I felt good.</p><p>Then a senior data scientist on the team asked me one question: <em>&#8220;What&#8217;s the base rate of fraud in the dataset?&#8221;</em></p><p>About 0.4%.</p><p>She didn&#8217;t say anything else. She didn&#8217;t need to. A model that predicted &#8220;not fraud&#8221; for every single transaction would have been 99.6% accurate &#8212; and completely useless.</p><p>That was week three at Accenture. I had a lot more to learn.</p><div><hr></div><h2>What the Project Actually Was</h2><p>I joined Accenture as a Machine Learning Engineer in mid-2021, working on a fraud detection and risk scoring platform for a large BFSI client. Banking, financial services, insurance &#8212; an industry where the cost of getting something wrong isn&#8217;t abstract. It&#8217;s real money, real customers, real consequences.</p><p>Our task: analyze over a million transaction records and build a system that could identify fraudulent activity, flag anomalies, and generate risk scores for review.</p><p>On paper, it sounded like a classic ML project. Train a classifier, evaluate on a test set, deploy.</p><p>In reality, it was one of the most humbling experiences of my career.</p><div><hr></div><h2>The Data Was the Actual Job</h2><p>Before I wrote a single line of model code, I spent weeks just understanding the data.</p><p>Fraud data has a problem that most textbook ML datasets don&#8217;t: it&#8217;s almost entirely negative. Fraud is rare by design &#8212; bad actors try hard not to look like bad actors. In our dataset, fraudulent transactions were a tiny fraction of the total. If your model learns to just say &#8220;not fraud&#8221; every time, it will score well on accuracy and fail completely at the one thing it&#8217;s supposed to do.</p><p>So we got to work on the feature engineering side &#8212; and this is where most of the real value was created.</p><p>We built over 40 features: transaction frequency patterns, amount deviation from a customer&#8217;s historical baseline, merchant category mismatches, device fingerprints, geographic anomalies, account age, payment channel behavior, and historical fraud indicators. Each of these was a hypothesis about what a fraudulent transaction looks like, encoded into something a model could learn from.</p><p>Then came the imbalance problem. We used SMOTE to generate synthetic minority-class samples, combined with undersampling and class-weight tuning. That combination improved our fraud recall by nearly 20% &#8212; meaning we caught significantly more actual fraud &#8212; while keeping false positives at a level the client&#8217;s review team could actually handle.</p><p>That balance matters more than people realize. Flag too much and your human reviewers drown. Flag too little and the fraud slips through. The model isn&#8217;t making that decision alone &#8212; it&#8217;s making it in partnership with the people downstream.</p><div><hr></div><h2>The Model Wasn&#8217;t the Hard Part</h2><p>We landed on XGBoost for the core classifier. It handled the tabular data well, gave us interpretable feature importances, and responded well to hyperparameter tuning. After cross-validation and threshold optimization, we were sitting at roughly 0.86 ROC-AUC &#8212; an 18&#8211;22% improvement over the baseline the client had been using.</p><p>We also layered in Isolation Forest for anomaly detection. The XGBoost model was trained on historical fraud patterns &#8212; it was good at catching things that looked like fraud it had seen before. Isolation Forest was good at catching things that just looked <em>weird</em>, regardless of whether they matched a known pattern. Together they covered more surface area.</p><p>But here&#8217;s what I actually learned from all of this: the model performance metrics were almost never the most important conversation.</p><p>The most important conversations were about thresholds.</p><p>At what score do you flag a transaction for human review? At what score do you block it automatically? What&#8217;s the cost of a false positive to a legitimate customer who gets their card declined? What&#8217;s the cost of a false negative to the client when a fraudulent transaction goes through?</p><p>These aren&#8217;t ML questions. They&#8217;re business questions. And the answers changed depending on who you asked &#8212; the risk team, the product team, the compliance team, the executive sponsor. Part of my job was translating between what the model could do and what those conversations actually required.</p><div><hr></div><h2>The Lesson That Has Stayed With Me</h2><p>I came into Accenture thinking the job was to build a good model.</p><p>I left understanding that the job was to build a good decision system &#8212; one where the model was one component, the features were another, the threshold logic was another, the human review process was another, and the monitoring pipeline that caught when the distribution shifted was another.</p><p>A model that performs well in evaluation and degrades quietly in production without anyone noticing isn&#8217;t a success. It&#8217;s a slow failure.</p><p>We implemented monitoring and batch scoring automation partly to speed up the review workflow, but also to give the team a way to see when something was changing &#8212; when fraud patterns were evolving, when a new attack vector was emerging, when the model&#8217;s assumptions were starting to drift from reality.</p><p>That&#8217;s the part of ML that doesn&#8217;t make it into blog posts about model architectures. It&#8217;s less exciting than a good ROC curve. But it&#8217;s the difference between a project and a product.</p><div><hr></div><h2>What I&#8217;d Tell a Junior ML Engineer Starting Out</h2><p><strong>Fall in love with the features, not the model.</strong> Anyone can throw XGBoost at a dataset. The people who create real value understand the domain well enough to know which signals matter and why.</p><p><strong>Learn to think in thresholds, not accuracy.</strong> Almost every real ML decision involves a tradeoff between precision and recall, between false positives and false negatives. Understand the asymmetric costs in your domain before you touch a single hyperparameter.</p><p><strong>The model is not the system.</strong> Data pipelines, feature stores, monitoring, human-in-the-loop workflows &#8212; these are not afterthoughts. They are the product.</p><p><strong>Ask the dumb question.</strong> The confusion matrix lesson in week three came because someone asked a question I should have asked myself. In every project since, I&#8217;ve tried to be the person who asks it first.</p><div><hr></div><p>Fraud detection taught me that machine learning in the real world is less about elegance and more about understanding what happens when you&#8217;re wrong &#8212; and building something resilient enough to handle it.</p><p>I&#8217;m still building on that foundation.</p>]]></content:encoded></item><item><title><![CDATA[Real Waste ML Classification ]]></title><description><![CDATA[Teaching Machines to See Trash: Our Journey with Deep Learning and Waste Classification]]></description><link>https://www.snehvora.me/p/real-waste-ml-classification</link><guid isPermaLink="false">https://www.snehvora.me/p/real-waste-ml-classification</guid><dc:creator><![CDATA[Sneh Vora]]></dc:creator><pubDate>Sun, 21 Sep 2025 16:56:47 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/d5db1cd1-d5c0-4396-98ec-138b88126806_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It started with a simple, yet daunting question: <em>what if machines could look at trash and know exactly where it belongs?</em></p><p>Every day, millions of tons of waste are produced worldwide. Most of it ends up in landfills, much of it recyclable, and some even biodegradable. The problem isn&#8217;t just the waste&#8212;it&#8217;s the sorting. Traditional methods of waste management rely heavily on human labor: tedious, error-prone, and often inefficient. And that&#8217;s where our project began.</p><p>We wanted to build something that could make waste classification faster, smarter, and&#8212;most importantly&#8212;automatic.</p><div><hr></div><h3>The Beginning: Choosing the Dataset</h3><p>Our journey started with the <strong>Real Waste Dataset from UCI</strong>, a collection of <strong>4,752 images</strong> divided into <strong>9 waste categories</strong>:</p><ul><li><p><strong>Cardboard</strong> &#8211; 461 images</p></li><li><p><strong>Food Organics</strong> &#8211; 411 images</p></li><li><p><strong>Glass</strong> &#8211; 420 images</p></li><li><p><strong>Metal</strong> &#8211; 790 images</p></li><li><p><strong>Miscellaneous Trash</strong> &#8211; 495 images</p></li><li><p><strong>Paper</strong> &#8211; 500 images</p></li><li><p><strong>Plastic</strong> &#8211; 921 images</p></li><li><p><strong>Textile Trash</strong> &#8211; 318 images</p></li><li><p><strong>Vegetation</strong> &#8211; 436 images</p></li></ul><p>Each image measured <strong>524 &#215; 524 pixels with 3 RGB channels</strong>, which meant our neural networks had to process high-dimensional data. Before feeding the dataset into the models, we <strong>standard-scaled</strong> it to ensure faster convergence during training.</p><div><hr></div><h3>The Metrics that Mattered</h3><p>We didn&#8217;t want just accuracy&#8212;we wanted a balanced picture.</p><p>So, we evaluated our models using:</p><ul><li><p><strong>Categorical Cross-Entropy Loss</strong>: to measure how far our predictions were from reality.</p></li><li><p><strong>F1 Score</strong>: the harmonic mean of precision and recall, ensuring we weren&#8217;t just good at some classes while failing miserably at others.</p></li></ul><div><hr></div><h3>The Experiments: Building &amp; Training Models</h3><p>We split the dataset: <strong>80% for training and 20% for validation</strong>. Then came the real grind&#8212;training not one, but <strong>four different CNN architectures</strong>:</p><ol><li><p><strong>Custom CNN (Our Own Model)</strong></p><ul><li><p>Built from scratch.</p></li><li><p>Served as a baseline for comparison.</p></li></ul></li><li><p><strong>VGG 19</strong></p><ul><li><p>A deep, well-known architecture, but surprisingly under-explored in Kaggle waste classification notebooks.</p></li></ul></li><li><p><strong>Inception V3</strong></p><ul><li><p>Transfer learning with a twist:</p><ul><li><p>First, we trained only the top layer for <strong>10 epochs</strong>.</p></li><li><p>Then, we fine-tuned by unfreezing some of the pre-trained layers and continuing for another <strong>10 epochs</strong>with a reduced learning rate.</p></li></ul></li></ul></li><li><p><strong>MobileNets V1.0</strong></p><ul><li><p>Lightweight, fast, and efficient.</p></li><li><p>Also, not previously explored in Kaggle notebooks for this dataset.</p></li></ul></li></ol><p>To make our models smarter, we added:</p><ul><li><p><strong>Step Learning Rate Scheduling</strong> &#8211; dynamically adjusting the learning rate during training.</p></li><li><p><strong>Model Checkpoints</strong> &#8211; automatically saving the best-performing weights whenever the validation loss improved.</p></li></ul><p>All models were trained using the <strong>Adamax optimizer</strong> with <strong>CrossEntropy loss</strong>.</p><div><hr></div><h3>The Results: Which Model Won?</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7UFG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5623998c-629c-46c7-8dd3-274b40f8b346_1242x1090.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7UFG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5623998c-629c-46c7-8dd3-274b40f8b346_1242x1090.png 424w, https://substackcdn.com/image/fetch/$s_!7UFG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5623998c-629c-46c7-8dd3-274b40f8b346_1242x1090.png 848w, https://substackcdn.com/image/fetch/$s_!7UFG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5623998c-629c-46c7-8dd3-274b40f8b346_1242x1090.png 1272w, https://substackcdn.com/image/fetch/$s_!7UFG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5623998c-629c-46c7-8dd3-274b40f8b346_1242x1090.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7UFG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5623998c-629c-46c7-8dd3-274b40f8b346_1242x1090.png" width="1242" height="1090" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5623998c-629c-46c7-8dd3-274b40f8b346_1242x1090.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1090,&quot;width&quot;:1242,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:303337,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.snehvora.me/i/174177702?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5623998c-629c-46c7-8dd3-274b40f8b346_1242x1090.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7UFG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5623998c-629c-46c7-8dd3-274b40f8b346_1242x1090.png 424w, https://substackcdn.com/image/fetch/$s_!7UFG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5623998c-629c-46c7-8dd3-274b40f8b346_1242x1090.png 848w, https://substackcdn.com/image/fetch/$s_!7UFG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5623998c-629c-46c7-8dd3-274b40f8b346_1242x1090.png 1272w, https://substackcdn.com/image/fetch/$s_!7UFG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5623998c-629c-46c7-8dd3-274b40f8b346_1242x1090.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Custom CNN Model Confusion Matrix</em></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NxXL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5bc8e2a-3ac5-48f4-b81e-b21d0c59f681_1262x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NxXL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5bc8e2a-3ac5-48f4-b81e-b21d0c59f681_1262x1080.png 424w, https://substackcdn.com/image/fetch/$s_!NxXL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5bc8e2a-3ac5-48f4-b81e-b21d0c59f681_1262x1080.png 848w, https://substackcdn.com/image/fetch/$s_!NxXL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5bc8e2a-3ac5-48f4-b81e-b21d0c59f681_1262x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!NxXL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5bc8e2a-3ac5-48f4-b81e-b21d0c59f681_1262x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NxXL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5bc8e2a-3ac5-48f4-b81e-b21d0c59f681_1262x1080.png" width="1262" height="1080" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f5bc8e2a-3ac5-48f4-b81e-b21d0c59f681_1262x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1080,&quot;width&quot;:1262,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:292559,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.snehvora.me/i/174177702?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5bc8e2a-3ac5-48f4-b81e-b21d0c59f681_1262x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NxXL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5bc8e2a-3ac5-48f4-b81e-b21d0c59f681_1262x1080.png 424w, https://substackcdn.com/image/fetch/$s_!NxXL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5bc8e2a-3ac5-48f4-b81e-b21d0c59f681_1262x1080.png 848w, https://substackcdn.com/image/fetch/$s_!NxXL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5bc8e2a-3ac5-48f4-b81e-b21d0c59f681_1262x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!NxXL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5bc8e2a-3ac5-48f4-b81e-b21d0c59f681_1262x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Inception V3 Confusion Matrix</em></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rHz3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8f86ae-670b-4ce2-8a11-175dc819a44c_1240x1102.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rHz3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8f86ae-670b-4ce2-8a11-175dc819a44c_1240x1102.png 424w, https://substackcdn.com/image/fetch/$s_!rHz3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8f86ae-670b-4ce2-8a11-175dc819a44c_1240x1102.png 848w, https://substackcdn.com/image/fetch/$s_!rHz3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8f86ae-670b-4ce2-8a11-175dc819a44c_1240x1102.png 1272w, https://substackcdn.com/image/fetch/$s_!rHz3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8f86ae-670b-4ce2-8a11-175dc819a44c_1240x1102.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rHz3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8f86ae-670b-4ce2-8a11-175dc819a44c_1240x1102.png" width="1240" height="1102" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a8f86ae-670b-4ce2-8a11-175dc819a44c_1240x1102.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1102,&quot;width&quot;:1240,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:276754,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.snehvora.me/i/174177702?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8f86ae-670b-4ce2-8a11-175dc819a44c_1240x1102.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rHz3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8f86ae-670b-4ce2-8a11-175dc819a44c_1240x1102.png 424w, https://substackcdn.com/image/fetch/$s_!rHz3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8f86ae-670b-4ce2-8a11-175dc819a44c_1240x1102.png 848w, https://substackcdn.com/image/fetch/$s_!rHz3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8f86ae-670b-4ce2-8a11-175dc819a44c_1240x1102.png 1272w, https://substackcdn.com/image/fetch/$s_!rHz3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8f86ae-670b-4ce2-8a11-175dc819a44c_1240x1102.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>MobileNets Confusion Matrix</em></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9YWB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c49dac0-bac1-46c4-9e44-ca37c716ff55_1208x1074.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9YWB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c49dac0-bac1-46c4-9e44-ca37c716ff55_1208x1074.png 424w, https://substackcdn.com/image/fetch/$s_!9YWB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c49dac0-bac1-46c4-9e44-ca37c716ff55_1208x1074.png 848w, https://substackcdn.com/image/fetch/$s_!9YWB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c49dac0-bac1-46c4-9e44-ca37c716ff55_1208x1074.png 1272w, https://substackcdn.com/image/fetch/$s_!9YWB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c49dac0-bac1-46c4-9e44-ca37c716ff55_1208x1074.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9YWB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c49dac0-bac1-46c4-9e44-ca37c716ff55_1208x1074.png" width="1208" height="1074" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6c49dac0-bac1-46c4-9e44-ca37c716ff55_1208x1074.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1074,&quot;width&quot;:1208,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:285036,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.snehvora.me/i/174177702?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c49dac0-bac1-46c4-9e44-ca37c716ff55_1208x1074.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9YWB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c49dac0-bac1-46c4-9e44-ca37c716ff55_1208x1074.png 424w, https://substackcdn.com/image/fetch/$s_!9YWB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c49dac0-bac1-46c4-9e44-ca37c716ff55_1208x1074.png 848w, https://substackcdn.com/image/fetch/$s_!9YWB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c49dac0-bac1-46c4-9e44-ca37c716ff55_1208x1074.png 1272w, https://substackcdn.com/image/fetch/$s_!9YWB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c49dac0-bac1-46c4-9e44-ca37c716ff55_1208x1074.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>VGG 19 Confusion Matrix</em></figcaption></figure></div><p>After weeks of experimentation, here&#8217;s how the models stacked up:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a0Ep!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bdea17b-6704-4a85-a2d2-ea27cdff26d5_1212x432.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a0Ep!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bdea17b-6704-4a85-a2d2-ea27cdff26d5_1212x432.png 424w, https://substackcdn.com/image/fetch/$s_!a0Ep!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bdea17b-6704-4a85-a2d2-ea27cdff26d5_1212x432.png 848w, https://substackcdn.com/image/fetch/$s_!a0Ep!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bdea17b-6704-4a85-a2d2-ea27cdff26d5_1212x432.png 1272w, https://substackcdn.com/image/fetch/$s_!a0Ep!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bdea17b-6704-4a85-a2d2-ea27cdff26d5_1212x432.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a0Ep!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bdea17b-6704-4a85-a2d2-ea27cdff26d5_1212x432.png" width="1212" height="432" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6bdea17b-6704-4a85-a2d2-ea27cdff26d5_1212x432.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:432,&quot;width&quot;:1212,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:48244,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.snehvora.me/i/174177702?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bdea17b-6704-4a85-a2d2-ea27cdff26d5_1212x432.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a0Ep!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bdea17b-6704-4a85-a2d2-ea27cdff26d5_1212x432.png 424w, https://substackcdn.com/image/fetch/$s_!a0Ep!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bdea17b-6704-4a85-a2d2-ea27cdff26d5_1212x432.png 848w, https://substackcdn.com/image/fetch/$s_!a0Ep!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bdea17b-6704-4a85-a2d2-ea27cdff26d5_1212x432.png 1272w, https://substackcdn.com/image/fetch/$s_!a0Ep!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bdea17b-6704-4a85-a2d2-ea27cdff26d5_1212x432.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The clear winner was <strong>Inception V3</strong>, achieving the highest F1-score. But there was a catch: it was also the slowest model. On the other hand, <strong>MobileNets V1.0</strong> delivered an impressive balance&#8212;high accuracy with significantly faster performance, thanks to its <strong>depthwise separable convolution layers</strong>.</p><div><hr></div><h3>What We Learned</h3><p>In the end, our project proved that deep learning can dramatically improve waste classification. While <strong>Inception V3</strong>shined in accuracy, <strong>MobileNets V1.0</strong> showed us that speed and efficiency can&#8217;t be overlooked&#8212;especially if such a system is ever to be deployed in real-world recycling plants.</p><p>This wasn&#8217;t just an experiment&#8212;it was a step toward rethinking how technology can tackle one of the world&#8217;s oldest problems: <em>trash.</em></p><div><hr></div><h3>The Team Behind the Work</h3><ul><li><p><strong>Amish Faldu (af557)</strong></p></li><li><p><strong>Sneh Vora (sv992)</strong></p></li><li><p><strong>Palak Pabani (pp872)</strong></p></li></ul><div><hr></div><h3>Want to Explore More?</h3><ul><li><p>&#128194; Dataset: <a href="https://archive.ics.uci.edu/dataset/908/realwaste">UCI Real Waste Dataset</a></p></li><li><p>&#128187; Code: <a href="https://drive.google.com/drive/folders/1UnGzeVwuwI4ap-v0p4m4ro1qXug3mnD_?usp=share_link">Google Drive Repository</a></p></li><li><p>&#127909; Project Video: <a href="https://drive.google.com/file/d/1DwDitx8-5ZPQMI9rPKuNIy034M6QPGFj/view?usp=sharing">Link</a></p></li></ul><div><hr></div><p>&#128073; This is just the beginning. Imagine a world where waste bins have eyes&#8212;powered by AI&#8212;to sort your trash for you. That&#8217;s not science fiction anymore. It&#8217;s where we&#8217;re headed.</p>]]></content:encoded></item><item><title><![CDATA[Multi-Agent Reinforcement Learning (MultiCarRacing-v0)]]></title><description><![CDATA[Teaching Cars to Think: My Reinforcement Learning Racing Journey]]></description><link>https://www.snehvora.me/p/multi-agent-reinforcement-learning</link><guid isPermaLink="false">https://www.snehvora.me/p/multi-agent-reinforcement-learning</guid><dc:creator><![CDATA[Sneh Vora]]></dc:creator><pubDate>Sun, 14 Sep 2025 19:48:06 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/14d1ad09-c2ea-4cb0-8e63-d97ca97cec41_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;5d95577f-07a8-41cb-9eef-50475d1a867f&quot;,&quot;duration&quot;:null}"></div><p><strong>Github Link</strong> : <a href="https://github.com/snehvora/Multi-Car-Racing">Multi-Agent Reinforcement Learning (MultiCarRacing-v0)</a></p><p>It started with frustration, not inspiration.</p><p>I was deep into my machine learning coursework, scrolling through Kaggle competitions and research papers, when I noticed a common theme: most reinforcement learning (RL) tutorials stop at Atari games. Pong, Breakout, maybe CartPole if you&#8217;re lucky.</p><p>Cool examples, sure&#8212;but they didn&#8217;t feel <em>real</em>.</p><p>I wanted something messy, dynamic, and closer to the real world. Something where the agent couldn&#8217;t just memorize screen pixels but actually had to <strong>react, adapt, and survive</strong>.</p><p>That&#8217;s when I found the <strong>CarRacing environment</strong>.</p><div><hr></div><h2>Why Racing Cars?</h2><p>Car racing is chaotic and unforgiving. Every turn is a test:</p><ul><li><p>Brake too late, and you&#8217;re off the track.</p></li><li><p>Accelerate too much, and you spin out.</p></li><li><p>Turn too little, and you never finish the lap.</p></li></ul><p>It perfectly embodies what makes RL so powerful&#8212;learning to make a sequence of split-second decisions under uncertainty.</p><p>This was the challenge I wanted to tackle:<br>&#128073; <em>Could I train AI agents, using RL, to learn racing strategies without any rules handed to them?</em></p><div><hr></div><h2>Setting Up the Track</h2><p>I worked with two environments:</p><ul><li><p><strong>Single-agent mode</strong> (CarRacing-v2, Gymnasium): One car, one brain, grayscale vision.</p></li><li><p><strong>Multi-agent mode</strong> (custom repo): Two cars racing on the same track, each acting independently.</p></li></ul><p>If single-agent racing was about teaching one car to survive, multi-agent racing felt more like <strong>refereeing a duel</strong>.</p><p>When I first put two cars on the same track, something strange happened: either they both crawled along cautiously (to avoid penalties) or they crashed headlong into chaos. Clearly, they needed better incentives.</p><p>So I reshaped the reward system to act like a race official:</p><ul><li><p><strong>&#9201;&#65039; Time Penalty</strong>: Every frame cost them -0.1 points. No stalling at the start line.</p></li><li><p><strong>&#127942; Progress Reward</strong>:</p><ul><li><p>The <strong>leading car</strong> earned <strong>+1000/N per tile</strong>.</p></li><li><p>The <strong>trailing car</strong> earned <strong>+500/N per tile</strong>.<br>(Where <em>N</em> is the total track length in tiles.)<br>This way, both cars stayed motivated, but the leader always had an edge&#8212;just like in real racing.</p></li></ul></li><li><p><strong>&#128683; Off-Track Penalty</strong>: Going off track meant <strong>-100 points</strong>. A harsh but necessary rule to keep driving clean.</p></li></ul><p>The result? The cars stopped loafing around and started <em>racing</em>. One would pull ahead, the other would chase, and both learned that the only way forward was&#8212;literally&#8212;forward.</p><div><hr></div><h2>Models in the Pit Stop</h2><p>I tested two core RL approaches:</p><p>1&#65039;&#8419; <strong>Deep Q-Networks (DQN)</strong></p><ul><li><p>Good for discrete actions.</p></li><li><p>Pixel inputs &#8594; CNN &#8594; action-value estimation.</p></li><li><p>I even experimented with <strong>ResNet transfer learning</strong> and <strong>LSTM-ResNet</strong> hybrids.</p></li></ul><p>2&#65039;&#8419; <strong>Proximal Policy Optimization (PPO)</strong></p><ul><li><p>A policy-gradient method.</p></li><li><p>More stable learning curves.</p></li><li><p>Tried it in single-agent setups for comparison.</p></li></ul><div><hr></div><h2>Race Results &#127937;</h2><ul><li><p><strong>Single-agent DQN</strong>: After ~2.5M steps (15 hours CPU training), the agent achieved ~800 average reward&#8212;right in line with published results.</p></li><li><p><strong>Single-agent PPO</strong>: Smoother early learning (~500 reward), but plateaued.</p></li><li><p><strong>Multi-agent DQN</strong>: After ~52 hours, both cars learned reasonable policies (~400 reward each), but sometimes &#8220;fought&#8221; over track tiles instead of racing efficiently.</p></li></ul><div><hr></div><h2>Lessons From the Track</h2><ul><li><p><strong>Representation is everything</strong>: Frame stacking gave agents memory of momentum.</p></li><li><p><strong>Algorithms trade off differently</strong>: DQN is efficient but unstable; PPO is steady but weaker.</p></li><li><p><strong>Multi-agent is messy</strong>: Independent learners don&#8217;t naturally cooperate&#8212;you have to design incentives.</p></li></ul><div><hr></div><h2>The Roadblocks</h2><ul><li><p><strong>Q-value instability</strong> &#8594; fixed with replay buffers + target networks.</p></li><li><p><strong>Long CPU training</strong> &#8594; I had to optimize every preprocessing step.</p></li><li><p><strong>Reward tuning</strong> &#8594; a balancing act between punishment and encouragement.</p></li></ul><div><hr></div><h2>Why This Matters Beyond Racing</h2><p>What I learned here isn&#8217;t just about cars in a simulator.</p><p>This applies to any system where decisions must be made in real time under uncertainty:</p><ul><li><p><strong>Autonomous vehicles</strong> avoiding crashes.</p></li><li><p><strong>Robotics</strong> navigating dynamic warehouses.</p></li><li><p><strong>Logistics systems</strong> optimizing deliveries and routes.</p></li></ul><p>Reinforcement learning gives machines the ability to <strong>adapt</strong>, not just follow pre-coded instructions.</p><p>And racing, for me, was the perfect playground to test those limits.</p>]]></content:encoded></item><item><title><![CDATA[Legal AI for Bankruptcy Cases 🏛️🤖]]></title><description><![CDATA[My Journey Into Building RAG for Law]]></description><link>https://www.snehvora.me/p/legal-ai-for-bankruptcy-cases</link><guid isPermaLink="false">https://www.snehvora.me/p/legal-ai-for-bankruptcy-cases</guid><dc:creator><![CDATA[Sneh Vora]]></dc:creator><pubDate>Fri, 12 Sep 2025 04:48:34 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1c1c7090-a3f2-4007-9f86-8edb92df0568_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;2bd7d6b9-c7f5-4cbf-a1dc-55b079aa7be9&quot;,&quot;duration&quot;:null}"></div><p><strong>Github Link</strong> :<strong> </strong><a href="https://github.com/snehvora/Legal-AI-For-Bankruptcy-Cases">Legal-AI-For-Bankruptcy-Cases</a></p><h3>The Problem That Hooked Me</h3><p>Bankruptcy law isn&#8217;t just about numbers on a balance sheet&#8212;it&#8217;s a maze of case law, filings, motions, and obscure legal jargon.</p><p>When I first started exploring this domain, I noticed how <strong>time-consuming</strong> it was for legal professionals to manually sift through lengthy documents just to extract relevant precedents. Unlike other industries, legal text is unforgiving&#8212;if you miss context, you risk making the wrong judgment.</p><p>That&#8217;s when the idea hit me:<br>&#128073; <em>What if I could build an AI that retrieves the right legal passages and generates answers like a seasoned paralegal&#8212;fast, reliable, and context-aware?</em></p><div><hr></div><h3>From Frustration to Framework</h3><p>At first, I naively thought: <em>&#8220;Throw GPT at it, and we&#8217;re done.&#8221;</em><br>But reality humbled me. A generic LLM struggled&#8212;it hallucinated, mixed up precedents, and ignored nuanced legal definitions. In law, &#8220;close enough&#8221; is the same as <em>wrong</em>.</p><p>So, I turned to <strong>Retrieval-Augmented Generation (RAG)</strong>.</p><p>Instead of relying on a model&#8217;s memory, I designed a system where the AI could:</p><ol><li><p><strong>Search</strong> legal documents (via Pinecone &#127794; for vector search)</p></li><li><p><strong>Rank</strong> passages by relevance (thanks to Cohere &#128269;)</p></li><li><p><strong>Generate</strong> nuanced responses (with Cohere, Ollama, and Gemini &#10024;)</p></li><li><p><strong>Orchestrate workflows</strong> (using LangGraph &#129513; for state management)</p></li></ol><div><hr></div><h3>What I Actually Built</h3><p>The result is what I call <strong>Legal AI for Bankruptcy Cases</strong>.</p><p>It&#8217;s not just another chatbot&#8212;it&#8217;s a multi-component system:</p><ul><li><p><strong>Streamlit UI</strong> for an intuitive interface.</p></li><li><p><strong>Poetry-managed dependencies</strong> so devs can spin it up cleanly.</p></li><li><p><strong>Pinecone + Cohere + Ollama + Gemini</strong> powering retrieval and generation.</p></li><li><p><strong>LangGraph</strong> to manage complex legal workflow states.</p></li><li><p><strong>XML handling</strong> for parsing structured legal files.</p></li></ul><div><hr></div><h3>Why This Matters</h3><p>Bankruptcy law affects everything from small businesses to global corporations. A system like this isn&#8217;t replacing lawyers&#8212;but it&#8217;s <strong>amplifying their ability</strong> to find relevant information instantly. Think of it as a turbocharged research assistant that never sleeps.</p><div><hr></div><h3>Lessons Learned (the hard way)</h3><ul><li><p>Legal data is <strong>messy</strong>&#8212;parsing and indexing documents took longer than I expected.</p></li><li><p>RAG is powerful, but <strong>state management</strong> (via LangGraph) was the real unlock to making workflows consistent.</p></li><li><p>AI isn&#8217;t about replacing expertise&#8212;it&#8217;s about scaling it.</p></li></ul><div><hr></div><h3>What&#8217;s Next</h3><p>I see this project as a foundation. The legal AI ecosystem is still in its infancy, and the potential is massive. Bankruptcy is just one domain; imagine extending this to <strong>contracts, compliance, or litigation prep</strong>.</p><p>If you&#8217;re a legal professional, ML engineer, or just curious about where AI meets law&#8212;you&#8217;ll want to keep an eye on this space.</p><div><hr></div><p>&#128279; Full repo here: <a href="https://github.com/snehvora/Legal-AI-For-Bankruptcy-Cases">Legal-AI-For-Bankruptcy-Cases</a></p><p>&#128172; Got thoughts? Hit reply or drop me a message. I&#8217;d love to hear how you see AI reshaping law.</p>]]></content:encoded></item><item><title><![CDATA[How I Built WPInsight Automator]]></title><description><![CDATA[Turning WordPress Management Into an AI-Driven Workflow]]></description><link>https://www.snehvora.me/p/how-i-built-wpinsight-automator</link><guid isPermaLink="false">https://www.snehvora.me/p/how-i-built-wpinsight-automator</guid><dc:creator><![CDATA[Sneh Vora]]></dc:creator><pubDate>Sun, 07 Sep 2025 17:24:40 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/da899064-7245-42dd-aa1d-ba5f6da59e7e_400x300.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Github Link</strong> : <a href="https://github.com/snehvora/WPInsight-Automator">WPInsight Automator on GitHub</a></p><p>When I first started working on WordPress projects back in 2021, I quickly realized that running a website isn&#8217;t just about writing great content&#8212;it&#8217;s about keeping it fresh, optimized, and consistent.</p><p>But here&#8217;s the catch: most of the time wasn&#8217;t spent on creativity. Instead, it was wasted on repetitive admin tasks&#8212;uploading posts, cleaning up drafts, tweaking SEO settings, checking plagiarism, or simply copy-pasting news articles from trusted sources. By the end of the day, it felt less like <em>content management</em> and more like <em>content babysitting</em>.</p><p>That frustration gave birth to <strong>WPInsight Automator</strong>.</p><div><hr></div><h2>The Problem That Sparked the Idea</h2><p>While helping with a WordPress project at <strong>CHARUSAT (Charotar University of Science and Technology)</strong>, I noticed a recurring pattern:</p><ul><li><p>Fetching trending news from sites like <em>NDTV</em> or <em>Hindustan Times</em>.</p></li><li><p>Paraphrasing and rewriting them to avoid duplication.</p></li><li><p>Uploading them manually into WordPress with proper formatting, categories, and SEO tags.</p></li><li><p>Running plagiarism checks to ensure originality.</p></li><li><p>Repeating this cycle for multiple posts every single day.</p></li></ul><p>It was tedious, time-consuming, and error-prone. And if you missed a day? The site instantly felt outdated.</p><p>I knew there had to be a better way.</p><div><hr></div><h2>The Vision</h2><p>I wanted to design a system that could:</p><ol><li><p><strong>Extract news automatically</strong> from trusted sources.</p></li><li><p><strong>Rephrase it intelligently</strong> using NLP so it wouldn&#8217;t be a direct copy.</p></li><li><p><strong>Upload and manage posts</strong> on WordPress with minimal human input.</p></li><li><p><strong>Run SEO checks</strong> so the content was not just uploaded, but also optimized.</p></li></ol><p>In short: an AI-powered assistant that could save <strong>80% of the manual work</strong>.</p><div><hr></div><h2>Enter WPInsight Automator &#128640;</h2><p>This wasn&#8217;t just a script&#8212;it was a complete workflow automation bot. Here&#8217;s what it could do:</p><p>&#10024; <strong>Key Features</strong></p><ul><li><p><strong>News Extraction</strong> &#8594; Automated scraping from <em>Hindustan Times</em> and <em>NDTV</em>.</p></li><li><p><strong>AI Paraphrasing</strong> &#8594; Leveraging <em>Transformers</em> and <em>PyTorch</em> for context-aware rewriting.</p></li><li><p><strong>WordPress Task Automation</strong> &#8594; Uploading, deleting, updating, and managing posts with Selenium.</p></li><li><p><strong>Plagiarism Check</strong> &#8594; Ensuring originality before publishing.</p></li><li><p><strong>SEO Integration</strong> &#8594; Automatic compatibility with Rank Math SEO plugin.</p></li><li><p><strong>File &amp; Content Management</strong> &#8594; Uploading media, handling drafts, and cleaning up clutter.</p></li></ul><div><hr></div><h2>The Tech Behind It</h2><p>I combined traditional automation with modern NLP:</p><ul><li><p><strong>Selenium</strong> &#8594; For interacting with WordPress like a human admin.</p></li><li><p><strong>Transformers + PyTorch</strong> &#8594; For paraphrasing and NLP-based rephrasing.</p></li><li><p><strong>NumPy &amp; Math</strong> &#8594; For handling data and backend calculations.</p></li><li><p><strong>MySQL</strong> &#8594; To store extracted content and metadata.</p></li><li><p><strong>Tkinter</strong> &#8594; For a simple GUI to make the tool usable by non-technical users.</p></li><li><p><strong>smtplib</strong> &#8594; For sending automated email alerts when new posts were ready.</p></li></ul><p>It wasn&#8217;t fancy at first&#8212;just a bunch of scripts strung together. But once it clicked, the efficiency boost was undeniable.</p><div><hr></div><h2>Lessons Learned</h2><ol><li><p><strong>Automation &#8800; Magic</strong><br>Every scraper breaks at some point. I learned the importance of making my pipeline modular so I could fix or swap out components easily.</p></li><li><p><strong>AI Needs Guardrails</strong><br>NLP paraphrasing isn&#8217;t perfect. Without plagiarism checks, it occasionally produced results too close to the source. Building fallback mechanisms taught me the importance of balancing automation with quality control.</p></li><li><p><strong>UX Matters, Even for Bots</strong><br>At first, WPInsight Automator was command-line only. Adding a Tkinter-based GUI made it usable for non-developers, which was a turning point.</p></li></ol><div><hr></div><h2>Why This Still Matters</h2><p>Even though this project dates back to 2021, the core idea resonates today:</p><ul><li><p><strong>AI + Automation</strong> can unlock huge productivity gains.</p></li><li><p>Content workflows, especially in digital publishing, are still ripe for optimization.</p></li><li><p>Building small, targeted tools can sometimes make a bigger impact than large, bloated solutions.</p></li></ul><p>For me, WPInsight Automator wasn&#8217;t just about WordPress&#8212;it was my entry point into blending <strong>machine learning, automation, and real-world usability</strong>.</p><p>And the best part? That spark of solving an everyday pain point has fueled every project I&#8217;ve worked on since.</p><div><hr></div><p>&#128073; Curious to see the code? The full project is available here: <a href="https://github.com/snehvora/WPInsight-Automator">WPInsight Automator on GitHub</a></p><p>If you&#8217;ve ever felt trapped by repetitive digital tasks, maybe it&#8217;s time to build your own automation story.</p>]]></content:encoded></item><item><title><![CDATA[Smart Inventory Bot]]></title><description><![CDATA[From Warehouse Chaos to Conversational Insights]]></description><link>https://www.snehvora.me/p/from-warehouse-chaos-to-conversational</link><guid isPermaLink="false">https://www.snehvora.me/p/from-warehouse-chaos-to-conversational</guid><dc:creator><![CDATA[Sneh Vora]]></dc:creator><pubDate>Sun, 07 Sep 2025 04:00:26 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/d29f7aea-7469-4368-90eb-8994589b5585_420x300.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;6d13e3f1-729e-410c-b07d-fa7d72388d07&quot;,&quot;duration&quot;:null}"></div><p><strong>Github Link</strong> : <strong><a href="https://github.com/snehvora/Smart-Inventory-Bot">Smart Inventory Bot</a></strong></p><p>Why I Built the Smart Inventory Bot?</p><p>It all started when I got curious about my uncle&#8217;s warehouse business.</p><p>Walking through the aisles, I noticed the same challenge that plagues so many small-to-mid-sized operations: <strong>the data existed, but the insights didn&#8217;t.</strong></p><p>He had mountains of transaction records&#8212;sales, discounts, returns, seasonal demand shifts&#8212;but whenever I asked him a simple business question like <em>&#8220;What&#8217;s the average discount on electronics versus clothing last year?&#8221;</em> his answer was always the same:</p><blockquote><p><em>&#8220;I&#8217;ll need to check with the accountant.&#8221;</em></p></blockquote><p>That struck me. Why should everyday business owners need a technical degree&#8212;or an expensive analyst&#8212;just to get insights from their own data?</p><p>That was the seed for what became the <strong><a href="https://github.com/snehvora/Smart-Inventory-Bot">Smart Inventory Bot</a></strong>.</p><div><hr></div><h2>The Problem I Saw</h2><p>Databases are great at storing information, but terrible at talking to humans.<br>On the other hand, humans are great at asking questions, but not everyone speaks SQL.</p><p>So, the gap is clear:</p><ul><li><p>Business owners have <strong>questions in plain English.</strong></p></li><li><p>The answers are <strong>locked in relational databases.</strong></p></li></ul><p>My goal was to build a bridge.</p><div><hr></div><h2>The Vision</h2><p>Imagine asking a bot in natural language&#8212;<br><em>&#8220;Show me the average discount by product category for 2023&#8221;</em>&#8212;<br>and instantly hearing back:</p><p>&#128483;&#65039; <em>&#8220;Electronics: 15%, Clothing: 10%, Home Goods: 5%.&#8221;</em></p><p>No code. No dashboards. No &#8220;let me get back to you.&#8221;</p><p>That&#8217;s what I wanted to bring to life with Smart Inventory Bot.</p><div><hr></div><h2>Under the Hood</h2><p>The bot works like a translator between humans and data.</p><p>1&#65039;&#8419; <strong>Understanding the Question</strong><br>At its core is a <strong>StateGraph</strong>, which decides where each query should go:</p><ul><li><p><strong>SQL path</strong> &#8594; if the question is data-specific.</p></li><li><p><strong>LLM path</strong> &#8594; if the question is more conversational or abstract.</p></li></ul><p>2&#65039;&#8419; <strong>Getting the Answer</strong><br>Two engines power the response:</p><ul><li><p><strong>SQL Generator</strong>: Converts natural language into SQL queries on the fly.</p></li><li><p><strong>LLM Responder</strong>: Handles edge cases, explanations, or questions outside the database.</p></li></ul><p>3&#65039;&#8419; <strong>Resilient by Design</strong><br>No bot is perfect, but Smart Inventory Bot doesn&#8217;t just fail silently.<br>It has an <strong>error-handling path</strong> that reroutes failed queries, ensuring the user always gets <em>some</em> response.</p><div><hr></div><h2>What Surprised Me</h2><p>The first time I asked it,<br><em>&#8220;What&#8217;s the average discount on electronics last year?&#8221;</em><br>and it not only wrote the SQL, ran it, and gave me the result in plain English&#8230;</p><p>&#8230;I had that spark every engineer knows. The <em>&#8220;it actually works&#8221;</em> moment.</p><p>That&#8217;s when I realized this isn&#8217;t just a student project. This is a tool small businesses could actually use to make data-driven decisions without hiring a full analytics team.</p><div><hr></div><h2>Why It Matters</h2><p>Warehouse owners like my uncle shouldn&#8217;t need to spend hours poring over spreadsheets.<br>Data should talk back, in the same language we use every day.</p><p>Smart Inventory Bot isn&#8217;t perfect yet&#8212;it&#8217;s a prototype&#8212;but it&#8217;s a glimpse of how natural language + structured data can empower everyday decision-making.</p><p>And that&#8217;s why I built it.</p><div><hr></div>]]></content:encoded></item><item><title><![CDATA[Twitter Sentiment Analysis with Textblob]]></title><description><![CDATA[International Journal of Innovative Science and Research Technology &#183; Dec 3, 2022]]></description><link>https://www.snehvora.me/p/twitter-sentiment-analysis-with-textblob</link><guid isPermaLink="false">https://www.snehvora.me/p/twitter-sentiment-analysis-with-textblob</guid><dc:creator><![CDATA[Sneh Vora]]></dc:creator><pubDate>Sat, 06 Sep 2025 21:35:43 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/45eda83d-f0c6-4610-9cb2-81530c9b3410_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>When I Taught Tweets to Speak: The Story Behind My Research</h3><p>It all started with a simple question that kept nagging at me: <em>What do people really feel when they post on Twitter?</em></p><p>In the whirlwind of digital conversations, Twitter has always fascinated me. Millions of people drop their thoughts every second&#8212;sometimes raw, sometimes witty, sometimes brutally honest. But beneath the chaos of hashtags, emojis, and abbreviations, I sensed a pattern. A hidden pulse.</p><p>And I wanted to capture it.</p><p>That curiosity became the seed of my research paper: <strong>&#8220;Twitter Sentiment Analysis using TextBlob&#8221;</strong> (<a href="https://ijisrt.com/assets/upload/files/IJISRT22NOV190_(1).pdf?utm_source=chatgpt.com">IJISRT, 2022</a>).</p><div><hr></div><h4>Why Twitter?</h4><p>When I began, I could have chosen Instagram, Facebook, or even Reddit. But Twitter stood out because of its brevity. Each tweet forces the user to compress their feelings into 280 characters. It&#8217;s like a stream of distilled human emotion&#8212;short, sharp, and surprisingly revealing.</p><p>For an ML engineer, that&#8217;s both a gift and a challenge.</p><p>The gift? Massive volumes of text-based interactions, perfect for analysis.<br>The challenge? Tweets are messy. Really messy.</p><div><hr></div><h4>The Early Struggles</h4><p>I still remember my first dataset. It was a jungle.</p><ul><li><p>Links to half-broken websites.</p></li><li><p>Random emojis.</p></li><li><p>Retweet markers.</p></li><li><p>And let&#8217;s not even talk about the spelling errors.</p></li></ul><p>Running my first scripts felt like staring into static on an old TV screen. I had data, sure&#8212;but no clarity.</p><p>That&#8217;s when I realized: before I could even think about machine learning, I had to get serious about <strong>cleaning</strong>.</p><p>I spent hours designing preprocessing steps: stripping URLs, normalizing text, removing stop words, and handling special characters. It felt less like data science and more like archaeology&#8212;scraping away dirt to reveal the artifact hidden beneath.</p><div><hr></div><h4>Building the Framework</h4><p>Once the noise was cleared, I turned to the heart of the project: <strong>sentiment analysis</strong>.</p><p>I didn&#8217;t start with deep neural networks or transformer models. Instead, I wanted to prove that <strong>a simple, accessible tool could still uncover powerful insights</strong>.</p><p>Enter <strong>TextBlob</strong>.</p><p>With its Pythonic simplicity, TextBlob let me classify tweets as <em>positive, negative,</em> or <em>neutral</em>. To some, it might seem too basic compared to today&#8217;s BERT or GPT-powered systems&#8212;but that was the beauty of it. The framework was lean, efficient, and approachable.</p><p>And soon enough, the results started pouring in.</p><div><hr></div><h4>What the Data Whispered</h4><p>The first time I visualized the sentiment distribution, it felt like watching a living heartbeat of the crowd. Suddenly, the noise had shape.</p><ul><li><p>I could see <strong>sentiment trends</strong> shifting around events.</p></li><li><p>Brands rising and falling in public favor.</p></li><li><p>Collective moods reacting in real-time to global happenings.</p></li></ul><p>This wasn&#8217;t just data&#8212;it was <strong>public opinion, quantified</strong>.</p><div><hr></div><h4>The Limitations I Faced</h4><p>Of course, I had my fair share of frustrations:</p><ul><li><p><strong>Language support</strong>: TextBlob only handled English. Every non-English tweet was a lost voice.</p></li><li><p><strong>Shallow classification</strong>: Some sarcasm or cultural nuance simply slipped through.</p></li><li><p><strong>Comparisons with advanced models</strong>: SVMs, LSTMs, and transformers promised higher accuracy, but I chose clarity and speed over complexity&#8212;for this paper at least.</p></li></ul><p>But every limitation also planted a seed for future work.</p><div><hr></div><h4>Lessons Learned</h4><p>Writing this paper wasn&#8217;t just about publishing&#8212;it was about learning.</p><p>I discovered that <strong>the hardest part of ML isn&#8217;t always the model&#8212;it&#8217;s the data.</strong><br>I learned how crucial it is to design with clarity, not just sophistication.<br>And most importantly, I realized that <strong>even simple approaches can have real impact</strong> when applied thoughtfully.</p><div><hr></div><h4>Where I&#8217;d Take It Next</h4><p>If I were to extend this research today, I&#8217;d explore:</p><ul><li><p><strong>Multilingual sentiment analysis</strong> to capture a truly global voice.</p></li><li><p><strong>Transformer-based models</strong> like BERT or RoBERTa for deeper contextual understanding.</p></li><li><p><strong>Real-time dashboards</strong> to let businesses visualize and act on sentiment as it unfolds.</p></li></ul><p>The journey started with TextBlob, but it certainly doesn&#8217;t end there.</p><div><hr></div><h4>Closing Thoughts</h4><p>Looking back, what began as a fascination with Twitter became a full-fledged research project that taught me more than I ever expected.</p><p>At its core, the paper was my attempt to decode the human voice&#8212;compressed into characters, hashtags, and emojis&#8212;and translate it into something organizations and individuals alike could understand.</p><p>And in that process, I realized something powerful: <strong>data doesn&#8217;t just tell us what happened. It tells us how we feel.</strong></p><p>That&#8217;s the story of my paper, and honestly, the story of why I fell in love with machine learning in the first place.</p>]]></content:encoded></item></channel></rss>