I built a very cheap App Review analysis pipeline that also finds Bugs in my app code

I got tired of reading app reviews manually every morning, so I built a command-line tool that pulls reviews from Google Play and the App Store, classifies them (bug, feature request, crash, praise), and sends me a daily summary. It costs about $0.50/month. In version 2, I added the ability to point it at my source code and get an analysis: which files are causing the bug, what to change, and how complex the fix is.

If you need to checkout code for your own setup : https://github.com/Mr-Ashish/AppPulse

The Problem

I have a few apps on Google Play. Every morning, I'd open the Play Console, scroll through new reviews, and try to figure out:

Is this a bug or a complaint?
Is this the same issue three other people mentioned yesterday?
Does this match any of the crashes I'm seeing in Firebase?

About 15 minutes of work. But it's unfocused work. Scanning text, making mental classifications, switching between tabs. On days with 20+ reviews, I'd either rush through them or skip the whole exercise.

I wanted a morning briefing:

"You got 12 new reviews. 3 are bugs (1 critical, photo upload crashes on large images). 2 are feature requests. 7 are praise. The critical bug matches a Firebase crash that affected 312 users this week."

So I built one.

Version 1: Classify and Digest

The Flow

Google Play ──┐
               → Pull new reviews→Classify review → Save to database
App Store ───┘         │             (using AI)              
                         │                                           
Firebase/Sentry ──────
---------------------------------------------------------------    
     ↓
     
Daily Digest                                                                             ┌──────────────────┐
│ 3 bugs (1 critical)
│ 2 feature requests
│ 7 praise
│ Top crash: OOM in
│  ImageUploadService
└──────────────────┘
      ↓         ↓
     Terminal   Slack

Classification

Each review goes through an LLM (a language model; I use OpenAI's gpt-4o-mini, their cheapest option). The LLM reads the review text and categorizes it:

Category	Meaning	Example review
bug	Something broken	"Upload button does nothing after I pick a photo"
crash	App closes or freezes	"App crashes every time I open settings"
feature_request	User wants something new	"Please add dark mode"
performance	Slow, laggy, battery drain	"App drains 20% battery in an hour"
praise	Positive feedback	"Best app I've used for this!"
complaint	General dissatisfaction	"Used to be good, now it's terrible"

Along with the category, the LLM extracts:

Severity (critical, major, or minor)
Keywords (the specific feature or area mentioned, e.g., "photo upload", "settings screen")
Functional area (which part of the app is affected)

Reviews go to the LLM in batches of 10, so one API call classifies 10 reviews at once. That's what keeps the cost low.

Cost

For an app with about 500 reviews per week:

Component	Monthly Cost
Google Play API	Free
App Store Connect API	Free
Firebase/Sentry API	Free (free tier)
OpenAI classification (~2,000 reviews/month)	~$0.50
Storage (SQLite, local file)	Free
Total	~$0.50/month

If you use Ollama (runs AI models on your own computer), the cost drops to $0.00/month.

The Tech Stack

Python , the whole tool is a Python command-line app
Click , a library for building command-line interfaces (handles apppulse run, apppulse reviews, etc.)
SQLAlchemy + SQLite , stores everything in a single local database file at ~/.apppulse/data.db
Rich , makes the terminal output clean with tables, colors, and panels
OpenAI / Anthropic / Ollama , LLM providers for classification (you pick one during setup)

No servers. No cloud infrastructure. No Docker. Run pip install and a 5-minute setup wizard.

A Typical Morning

$ apppulse run
# Pulling reviews for MyApp Android... 12 new reviews
# Classifying... done (0.8s)
# Pulling crash data from Firebase... 3 active crashes
# Correlating reviews with crashes... 1 match found

# ── Daily Digest ───────────────────────────────────────
#  12 new reviews: 3 bugs (1 critical), 2 feature requests, 7 praise
#
#  🔴 Critical: Photo upload crashes for >10MB images
#     → Matches Firebase crash: OutOfMemoryError (312 users affected)
#     → 3 reviews mention this issue
#
#  🟡 Major: Search doesn't update after applying filters
#     → 2 reviews
#
#  💡 Feature requests: Dark mode (2 requests)

Five seconds. I know what matters. No tab-switching, no scrolling through the Play Console.

Version 2: From Classification to Code Analysis

Version 1 told me what users were experiencing. It didn't tell me where in my code the problem lived or what to change.

For the critical bug above, "Photo upload crashes for >10MB images," I'd still need to:

Open the project
Search for upload-related code
Look at the Firebase stacktrace
Connect the dots
Figure out a fix

So I built a second stage: code analysis.

The Mechanism

                    apppulse analyze 42
                         │
                         ▼
        ┌────────────────────────────────┐
        │    Load the review from the    │
        │    database + crash data       │
        └───────────────┬────────────────┘
                        │
                        ▼
        ┌────────────────────────────────┐
        │    AI agent explores your      │
        │    source code with read-only  │
        │    tools (search, read files,  │
        │    find function definitions)  │
        └───────────────┬────────────────┘
                        │
                        ▼
        ┌────────────────────────────────┐
        │    Produces an Analysis Brief  │
        │    • Root cause                │
        │    • Affected files + lines    │
        │    • Proposed changes          │
        │    • Complexity estimate       │
        │    • Testing notes             │
        └────────────────────────────────┘

You point AppPulse at your app's source code folder during setup. Running apppulse analyze 42 (where 42 is a review ID) triggers the AI agent to:

Read the review and any matched crash data
Get a structural overview of your project (a map of every file and its key classes/functions)
Search the code for relevant keywords, error messages, class names, the feature area mentioned in the review
Read the specific files that look promising
Produce a structured analysis with file paths, line numbers, and proposed changes

The Output

┌──────────────────────────────────────────────────────────────┐
│  CODE ANALYSIS — Review #42                                   │
│  Photo upload crashes for >10MB images                        │
│  Type: bug │ Complexity: medium                               │
├──────────────────────────────────────────────────────────────┤
│                                                               │
│  ROOT CAUSE                                                   │
│  ImageUploadService.compress() loads the full image into      │
│  memory. Images >10MB exceed the memory limit on phones       │
│  with ≤4GB RAM, causing the app to crash.                     │
│                                                               │
│  AFFECTED FILES                                               │
│  ImageUploadService.java (lines 45-89) — compress() method    │
│  ImageUtils.java (lines 12-35) — no file size check           │
│                                                               │
│  PROPOSED CHANGES                                             │
│  1. Resize large images before loading them into memory       │
│  2. Add a file size check before compression starts           │
│  3. Use streaming compression instead of loading everything   │
│     into memory at once                                       │
│                                                               │
│  TESTING NOTES                                                │
│  Test with 1MB, 10MB, and 50MB images on a low-memory device  │
└──────────────────────────────────────────────────────────────┘

Instead of a category label, I get the file, the line, what to change, and how to test it.

Pick Your Analysis Engine

The analysis engine is pluggable. I built it with four options:

Backend	Description	Good for
Built-in	A lightweight AI agent that runs inside AppPulse, using PydanticAI (a Python framework for building AI agents with structured output)	Quick analyses, small codebases
Grok Build	xAI's coding assistant in headless mode	Fast, targeted exploration
Claude Code	Anthropic's coding assistant in headless mode	Deep multi-file investigations
OpenAI Codex	OpenAI's coding assistant	Sandboxed, autonomous mode

The external backends (Grok, Claude Code, Codex) use a two-stage approach:

The coding assistant explores the codebase and writes its findings as a markdown document
A cheap AI call ($0.001) extracts that document into the structured analysis format

Consistent output regardless of which AI did the exploration. Adding a new backend takes about 20 lines of code.

Switching is one line in the config file:

code_analysis:
  enabled: true
  backend: "grok"    # or "builtin", "claude_code", "codex"

Lessons

1. Classify before you analyze

Categorization (bug vs. feature request vs. praise) determines everything downstream. Code analysis needs good classification as input, or it wastes time investigating a feature request as if it were a bug.

2. Cheap AI models handle most of this

The classification pipeline runs on gpt-4o-mini, OpenAI's cheapest model. For categorizing app reviews, which are 1-3 sentences, it's accurate enough. Expensive models are overkill.

3. Give the AI a map of your codebase

The biggest analysis quality improvement came from generating a project map upfront, a compact overview of every file and its key components. Without the map, the AI spent most of its budget figuring out where things lived. With it, the AI goes to the relevant files on its first move. A GPS for the codebase.

4. Save raw output

Every analysis saves the AI's raw output as a markdown file. If the structured summary looks off, I can check what the AI found and pinpoint whether the problem was in the exploration or the extraction.

5. No infrastructure required

The whole tool runs as a single command-line app. The database is a SQLite file on disk. The cache is a text file. No server, no Docker container, no cloud deployment. I run apppulse run by hand or on a daily schedule.

Up Next

Batch analysis , analyze all critical bugs at once instead of one at a time
Per-app configuration , different analysis settings for different apps
Auto-analysis , run code analysis on critical bugs during the daily pipeline, so the morning digest includes root causes alongside classifications
Tracking outcomes , connect analyses to fixes, so the tool can learn which investigations led to successful resolutions

The classification half took a weekend. Code analysis took another. The pipeline costs less per month than a cup of coffee, and it replaced 15 minutes of unfocused tab-switching with a 5-second terminal readout.

I built a very cheap App Review analysis pipeline that also finds Bugs in my app code

The Problem

Version 1: Classify and Digest

The Flow

Classification

Cost

The Tech Stack

A Typical Morning

Version 2: From Classification to Code Analysis

The Mechanism

The Output

Pick Your Analysis Engine

Lessons

1. Classify before you analyze

2. Cheap AI models handle most of this

3. Give the AI a map of your codebase

4. Save raw output

5. No infrastructure required

Up Next

Comments

More from this blog

Your Team Can Build Dashboards — But Can You Share Them Safely?

From "You Have a Bug" to "Here's the Root Cause" — Adding AI Code Analysis to My App Review Pipeline

I Wanted to Keep Track of My App Reviews Without Expensive Tooling — So I Built My Own

I Wanted a Simple Morning Email Digest — Here's Why I Ignored LangChain, CrewAI, and Every AI Agent Framework

Command Palette

The Problem

Version 1: Classify and Digest

The Flow

Classification

Cost

The Tech Stack

A Typical Morning

Version 2: From Classification to Code Analysis

The Mechanism

The Output

Pick Your Analysis Engine

Lessons

1. Classify before you analyze

2. Cheap AI models handle most of this

3. Give the AI a map of your codebase

4. Save raw output

5. No infrastructure required

Up Next

Comments

More from this blog