Show HN: TheAuditor – Offline security scanner for AI-generated code

28 comments

·September 8, 2025

I'm an infrastructure architect who started using AI assistants to write code 3 months ago. After building several systems with Claude, I noticed a pattern: the code always had security issues I could spot from my ops background, but I couldn't fix them myself since I can't actually write code.

Why I built this: I needed a way to verify AI-generated code was production-safe. Existing tools either required cloud uploads (privacy concern) or produced output too large for AI context windows. TheAuditor solves both problems - it runs completely offline and chunks findings into 65KB segments that fit in Claude/GPT-4 context limits.

What I discovered: Testing on real projects, TheAuditor consistently finds 50-200+ vulnerabilities in AI-generated code. The patterns are remarkably consistent: - SQL queries using f-strings instead of parameterization - Hardcoded secrets (JWT_SECRET = "secret" appears in nearly every project) - Missing authentication on critical endpoints - Rate limiting using in-memory storage that resets on restart

Technical approach: TheAuditor runs 14 analysis phases in parallel, including taint analysis (tracking data from user input to dangerous sinks), pattern matching against 100+ security rules, and orchestrating industry tools (ESLint, Ruff, MyPy, Bandit). Everything outputs to structured JSON optimized for LLM consumption.

Interesting obstacle: When scanning files with vulnerabilities, antivirus software often quarantines our reports because they contain "malicious" SQL injection patterns - even though we're just documenting them. Had to implement pattern defanging to reduce false positives.

Current usage: Run aud full in any Python/JS/TS project. It generates a complete security audit in .pf/readthis/. The AI can then read these reports and fix its own vulnerabilities. I've seen projects go from 185 critical issues to zero in 3-4 iterations.

The tool is particularly useful if you're using AI assistants for production code but worry about security. It provides the "ground truth" that AI needs to self-correct.

Would appreciate feedback on: - Additional vulnerability patterns common in AI-generated code - Better ways to handle the antivirus false-positive issue - Integration ideas for different AI coding workflows

Thanks for taking a look! /TheAuditorTool

Visit

antonly

> TheAuditor solves ALL of this. It's not a "nice to have" - it's the missing piece that makes AI development actually trustworthy.

> I've built the tool that makes AI assistants production-ready. This isn't competing with SonarQube/SemGrep. This is creating an entirely new category: AI Development Verification Tools.

Wow, that's a lot of talk for a tool that does regex searches and some AST matching, supporting only python and js (these things are not mentioned in the main project README as far as I can tell?).

The actual implementation details are buried in an (LLM written?) document: https://github.com/TheAuditorTool/Auditor/blob/main/ARCHITEC...

My favourite part is the "Pipeline System", which outlines a "14-phase analysis pipeline", but does not number these stages.

It reads a bit like the author is hiding what the tool actually does, which is sad, because there might be some really neat ideas in there, but they are really hard to make out.

antonly

This is actually a really nice example of how security tools can fall flat:

There is this check [here](https://github.com/TheAuditorTool/Auditor/blob/2a3565ad38ece...), labelled "Time-of-check-time-of-use (TOCTOU) race condition pattern".

It reads:

This matches any line that contains `if` followed by `has` followed by `then` followed by `add`, for example. This is woefully insufficient for actually detecting TOCTOU, and even worse, will flag many many things as false positives.

Now the real problem is, that the author states that this will solve all your problems (literally), providing a completely false sense of security...

TheAuditorTool

After reviewing my own code. Thanks for digging into the code! You're reviewing the regex fallback patterns that only trigger when AST parsing fails. The primary detection uses Tree-sitter for structural analysis and taint flow tracking.

That TOCTOU pattern IS terrible - it's meant as a last-resort 'something might be wrong here' flag when we can't parse the AST. The real detection happens in theauditor/taint_analyzer/ which tracks actual data flow from filesystem checks to file operations.

But you're right - even fallback patterns shouldn't be this noisy. I'll tighten it to only flag actual filesystem operations: - os.path.exists → open() - fs.exists → fs.writeFile() - File.exists() → new FileWriter()

  If you actually run the tool with aud full, it uses the proper AST analysis first. These regex patterns are
  the third fallback when Tree-sitter isn't available.

  Thanks for the specific feedback - this is exactly why I open-sourced it!

drsopp

How come AST parsing fails? Does that imply syntax errors in the code?

null

[deleted]

TheAuditorTool

You're absolutely right about that TOCTOU pattern - it's terrible! That regex would flag every if cache.has(key) then cache.add(key, value) as a race condition. Thank you for the specific example.

This perfectly illustrates why I need community input. I'm not a developer - I literally can't code. I built this entire tool using Claude over 250 hours because I needed something to audit the code that Claude was writing for me. It's turtles all the way down!

The "14 phases" you mentioned are in theauditor/pipelines.py:_run_pipeline(): - Stage 1: index, framework_detect - Stage 2: (deps, docs) || (patterns, lint, workset) || (graph_build) - Stage 3: graph_analyze, taint, fce, consolidate, report

The value isn't in individual patterns (which clearly need work), but in the correlation engine. Example: when you refactor Product.price to ProductVariant.price, it tracks that change across your entire stack - finding frontend components, API calls, and database queries still using the old structure. SemGrep can't do this because it analyzes files in isolation.

You're 100% correct that I oversold it with "solves ALL your problems" - that's my non-developer enthusiasm talking. What it actually does: provides a ground truth about inconsistencies in your codebase that AI assistants can then fix. It's not a security silver bullet, it's a consistency checker.

The bad patterns like that TOCTOU check need fixing or removing. Would you be interested in helping improve them? Someone with your eye for detail would make this tool actually useful instead of security theater.

pityJuke

Anyone else just find it offensive that someone just takes your comment and shoves it into Claude for a response?

iamsaitam

"This perfectly illustrates why I need community input. I'm not a developer - I literally can't code. I built this entire tool using Claude over 250 hours because I needed something to audit the code that Claude was writing for me. It's turtles all the way down!" - should be in bold on a huge banner

TheAuditorTool

Why does it matter? Just because you know how to code doesnt mean you know how to build systems, architecture or infrastructure? I do, professional background in it.

enjoytheview

A security project vibe coded by someone who admittedly does not have a security or even software engineering background, what could go wrong!

TheAuditorTool

You're absolutely right to be skeptical! You do ignore that vibe coding isnt going away...

That's exactly why I built TheAuditor - because I DON'T trust the code I had AI write. When you can't verify code yourself, you need something that reports ground truth.

The beautiful irony: I used AI to build a tool that finds vulnerabilities in AI-generated code. It already found 204 SQL injections in one user's production betting site - all from following AI suggestions.

If someone with no coding ability can use AI + TheAuditor to build TheAuditor itself (and have it actually work), that validates the entire premise: AI can write code, but you NEED automated verification.

What could go wrong? Without tools like this, everything. That's the point.

grim_io

Using an established analysis tool like sonarcube is probably the way to go.

There is no difference between human made and AI made bad code, so I don't think we need specialized tools for that.

TheAuditorTool

"Using SonarQube is probably the way to go" "We don't need specialized tools"

Pick one.

SonarQube IS a specialized tool. It just specializes in different things than TheAuditor.

SonarQube: "This file has issues" heAuditor: "Your frontend and backend disagree about the data model"

Both have their place.

grim_io

Are you trying to "solve" unit and integration tests?

TheAuditorTool

No? At least read couple lines in the readme before joining the discussion please.

lewdwig

I have noticed that LLMs are actually pretty decent at redteaming code, so I’ve made it a habit of getting them to do that for code they generate periodically. A good loop is (a) generate code, (b) add test coverage for the code (to 70-80%) (c) redteam the code for possible performance/security concerns, (d) add regression tests for the issues uncovered and then fix the code.

TheAuditorTool

The glaring thing most people seem to miss that llm generated code is like TOS and unless you work in a more enterprise team setting? You are not going to catch 90% of the issues...

If this was used before releasing the tea spill fiasco, only to name one? It would never have been a fiasco. Just saying..

quibono

> Don't create a venv before installing TheAuditor

That's a strange ask in the Python ecosystem - what's the reason for this?

Also, what's the benefit of ESLint/Ruff/MyPy being utilised by this audit tool? I'm not sure I understand the benefit of having an LLM in between you and Ruff, for example.

ffsm8

It's a vibe coded project by a person that freely says they cannot code. What did you expect?

It's breathtaking how much of an enabler it already is, but curating a good dependency tree and staying within scope of the outlined work to do are not things LLMs are good at, currently.

TheAuditorTool

@quibono: Great questions! The "don't create venv" warning is because TheAuditor creates its own sandboxed environment (.auditor_venv/) for analyzing YOUR project. If you install TheAuditor inside your project's venv, you get nested virtualenvs which breaks the sandbox isolation. TheAuditor should be installed globally (or in ~/tools/), then it creates isolated environments for each project it analyzes.

The ESLint/Ruff/MyPy integration isn't about putting an LLM between you and linters. It's about aggregation and correlation. Example: - Ruff says "unused import" - MyPy says "type mismatch" - TheAuditor correlates: "You removed the import but forgot to update 3 type hints that depended on it"

The LLM reads the aggregated report to understand the full picture across all tools, not just individual warnings.

@ffsm8: You're absolutely right - I can't code and the dependency tree is probably a mess! That's exactly WHY I built this. When you're using AI to write code and can't verify if it's correct, you need something that reports the ground truth.

The irony isn't lost on me: I used Claude to build a tool that audits code written by Claude. It's enablement all the way down! But that's also the proof it works - if someone who can't code can use AI + TheAuditor to build TheAuditor itself, the development loop is validated.

The architectural decisions might be weird, but they're born from necessity, not incompetence. Happy to explain any specific weirdness!