Skip to main content

Command Palette

Search for a command to run...

I built a very cheap App Review analysis pipeline that also finds Bugs in my app code

Updated
9 min read

I got tired of reading app reviews manually every morning, so I built a command-line tool that pulls reviews from Google Play and the App Store, classifies them (bug, feature request, crash, praise), and sends me a daily summary. It costs about $0.50/month. In version 2, I added the ability to point it at my source code and get an analysis: which files are causing the bug, what to change, and how complex the fix is.

If you need to checkout code for your own setup : https://github.com/Mr-Ashish/AppPulse

The Problem

I have a few apps on Google Play. Every morning, I'd open the Play Console, scroll through new reviews, and try to figure out:

  • Is this a bug or a complaint?
  • Is this the same issue three other people mentioned yesterday?
  • Does this match any of the crashes I'm seeing in Firebase?

About 15 minutes of work. But it's unfocused work. Scanning text, making mental classifications, switching between tabs. On days with 20+ reviews, I'd either rush through them or skip the whole exercise.

I wanted a morning briefing:

"You got 12 new reviews. 3 are bugs (1 critical, photo upload crashes on large images). 2 are feature requests. 7 are praise. The critical bug matches a Firebase crash that affected 312 users this week."

So I built one.

Version 1: Classify and Digest

The Flow

Google Play ──┐
               → Pull new reviews→Classify review → Save to database
App Store ───┘         │             (using AI)              
                         │                                           
Firebase/Sentry ──────
---------------------------------------------------------------    
     ↓
     
Daily Digest                                                                             ┌──────────────────┐
│ 3 bugs (1 critical)
│ 2 feature requests
│ 7 praise
│ Top crash: OOM in
│  ImageUploadService
└──────────────────┘
      ↓         ↓
     Terminal   Slack

Classification

Each review goes through an LLM (a language model; I use OpenAI's gpt-4o-mini, their cheapest option). The LLM reads the review text and categorizes it:

Category Meaning Example review
bug Something broken "Upload button does nothing after I pick a photo"
crash App closes or freezes "App crashes every time I open settings"
feature_request User wants something new "Please add dark mode"
performance Slow, laggy, battery drain "App drains 20% battery in an hour"
praise Positive feedback "Best app I've used for this!"
complaint General dissatisfaction "Used to be good, now it's terrible"

Along with the category, the LLM extracts:

  • Severity (critical, major, or minor)
  • Keywords (the specific feature or area mentioned, e.g., "photo upload", "settings screen")
  • Functional area (which part of the app is affected)

Reviews go to the LLM in batches of 10, so one API call classifies 10 reviews at once. That's what keeps the cost low.

Cost

For an app with about 500 reviews per week:

Component Monthly Cost
Google Play API Free
App Store Connect API Free
Firebase/Sentry API Free (free tier)
OpenAI classification (~2,000 reviews/month) ~$0.50
Storage (SQLite, local file) Free
Total ~$0.50/month

If you use Ollama (runs AI models on your own computer), the cost drops to $0.00/month.

The Tech Stack

  • Python , the whole tool is a Python command-line app
  • Click , a library for building command-line interfaces (handles apppulse run, apppulse reviews, etc.)
  • SQLAlchemy + SQLite , stores everything in a single local database file at ~/.apppulse/data.db
  • Rich , makes the terminal output clean with tables, colors, and panels
  • OpenAI / Anthropic / Ollama , LLM providers for classification (you pick one during setup)

No servers. No cloud infrastructure. No Docker. Run pip install and a 5-minute setup wizard.

A Typical Morning

$ apppulse run
# Pulling reviews for MyApp Android... 12 new reviews
# Classifying... done (0.8s)
# Pulling crash data from Firebase... 3 active crashes
# Correlating reviews with crashes... 1 match found

# ── Daily Digest ───────────────────────────────────────
#  12 new reviews: 3 bugs (1 critical), 2 feature requests, 7 praise
#
#  🔴 Critical: Photo upload crashes for >10MB images
#     → Matches Firebase crash: OutOfMemoryError (312 users affected)
#     → 3 reviews mention this issue
#
#  🟡 Major: Search doesn't update after applying filters
#     → 2 reviews
#
#  💡 Feature requests: Dark mode (2 requests)

Five seconds. I know what matters. No tab-switching, no scrolling through the Play Console.

Version 2: From Classification to Code Analysis

Version 1 told me what users were experiencing. It didn't tell me where in my code the problem lived or what to change.

For the critical bug above, "Photo upload crashes for >10MB images," I'd still need to:

  1. Open the project
  2. Search for upload-related code
  3. Look at the Firebase stacktrace
  4. Connect the dots
  5. Figure out a fix

So I built a second stage: code analysis.

The Mechanism

                    apppulse analyze 42
                         │
                         ▼
        ┌────────────────────────────────┐
        │    Load the review from the    │
        │    database + crash data       │
        └───────────────┬────────────────┘
                        │
                        ▼
        ┌────────────────────────────────┐
        │    AI agent explores your      │
        │    source code with read-only  │
        │    tools (search, read files,  │
        │    find function definitions)  │
        └───────────────┬────────────────┘
                        │
                        ▼
        ┌────────────────────────────────┐
        │    Produces an Analysis Brief  │
        │    • Root cause                │
        │    • Affected files + lines    │
        │    • Proposed changes          │
        │    • Complexity estimate       │
        │    • Testing notes             │
        └────────────────────────────────┘

You point AppPulse at your app's source code folder during setup. Running apppulse analyze 42 (where 42 is a review ID) triggers the AI agent to:

  1. Read the review and any matched crash data
  2. Get a structural overview of your project (a map of every file and its key classes/functions)
  3. Search the code for relevant keywords, error messages, class names, the feature area mentioned in the review
  4. Read the specific files that look promising
  5. Produce a structured analysis with file paths, line numbers, and proposed changes

The Output

┌──────────────────────────────────────────────────────────────┐
│  CODE ANALYSIS — Review #42                                   │
│  Photo upload crashes for >10MB images                        │
│  Type: bug │ Complexity: medium                               │
├──────────────────────────────────────────────────────────────┤
│                                                               │
│  ROOT CAUSE                                                   │
│  ImageUploadService.compress() loads the full image into      │
│  memory. Images >10MB exceed the memory limit on phones       │
│  with ≤4GB RAM, causing the app to crash.                     │
│                                                               │
│  AFFECTED FILES                                               │
│  ImageUploadService.java (lines 45-89) — compress() method    │
│  ImageUtils.java (lines 12-35) — no file size check           │
│                                                               │
│  PROPOSED CHANGES                                             │
│  1. Resize large images before loading them into memory       │
│  2. Add a file size check before compression starts           │
│  3. Use streaming compression instead of loading everything   │
│     into memory at once                                       │
│                                                               │
│  TESTING NOTES                                                │
│  Test with 1MB, 10MB, and 50MB images on a low-memory device  │
└──────────────────────────────────────────────────────────────┘

Instead of a category label, I get the file, the line, what to change, and how to test it.

Pick Your Analysis Engine

The analysis engine is pluggable. I built it with four options:

Backend Description Good for
Built-in A lightweight AI agent that runs inside AppPulse, using PydanticAI (a Python framework for building AI agents with structured output) Quick analyses, small codebases
Grok Build xAI's coding assistant in headless mode Fast, targeted exploration
Claude Code Anthropic's coding assistant in headless mode Deep multi-file investigations
OpenAI Codex OpenAI's coding assistant Sandboxed, autonomous mode

The external backends (Grok, Claude Code, Codex) use a two-stage approach:

  1. The coding assistant explores the codebase and writes its findings as a markdown document
  2. A cheap AI call ($0.001) extracts that document into the structured analysis format

Consistent output regardless of which AI did the exploration. Adding a new backend takes about 20 lines of code.

Switching is one line in the config file:

code_analysis:
  enabled: true
  backend: "grok"    # or "builtin", "claude_code", "codex"

Lessons

1. Classify before you analyze

Categorization (bug vs. feature request vs. praise) determines everything downstream. Code analysis needs good classification as input, or it wastes time investigating a feature request as if it were a bug.

2. Cheap AI models handle most of this

The classification pipeline runs on gpt-4o-mini, OpenAI's cheapest model. For categorizing app reviews, which are 1-3 sentences, it's accurate enough. Expensive models are overkill.

3. Give the AI a map of your codebase

The biggest analysis quality improvement came from generating a project map upfront, a compact overview of every file and its key components. Without the map, the AI spent most of its budget figuring out where things lived. With it, the AI goes to the relevant files on its first move. A GPS for the codebase.

4. Save raw output

Every analysis saves the AI's raw output as a markdown file. If the structured summary looks off, I can check what the AI found and pinpoint whether the problem was in the exploration or the extraction.

5. No infrastructure required

The whole tool runs as a single command-line app. The database is a SQLite file on disk. The cache is a text file. No server, no Docker container, no cloud deployment. I run apppulse run by hand or on a daily schedule.


Up Next

  • Batch analysis , analyze all critical bugs at once instead of one at a time
  • Per-app configuration , different analysis settings for different apps
  • Auto-analysis , run code analysis on critical bugs during the daily pipeline, so the morning digest includes root causes alongside classifications
  • Tracking outcomes , connect analyses to fixes, so the tool can learn which investigations led to successful resolutions

The classification half took a weekend. Code analysis took another. The pipeline costs less per month than a cup of coffee, and it replaced 15 minutes of unfocused tab-switching with a 5-second terminal readout.