Published on

Don't Write a Line of AI Code Until Your Tests Are Green

By Andrew Blase
Authors
  • avatar
    Name
    Andrew Blase
    Twitter

The Hard Truth About AI-Generated Code

AI will write code that looks exactly right and is completely wrong.

That's not a knock on the tools. It's the nature of how they work. Large language models are trained to produce plausible output — syntactically clean, idiomatically correct TypeScript that does something close to what you asked. But "close" in production isn't close. It's a bug.

Without tests, you have no way to know which you got.

This is the article I wish existed when I started using Claude and Copilot seriously for production work. The short version: set up your testing infrastructure before you write a single line of AI-generated feature code. Not after the first sprint. Not once you've shipped something. Before.

This is Article 1 in a CI/CD series. The second article covers Husky and GitHub Actions — typecheck, lint, and test checks on every PR. The third covers Snyk, SonarQube Cloud, and Kilo Code Review. But you need to read this one first. Everything else in the series assumes you have tests worth running. If your test suite is empty, the CI pipeline is theater.

Testing Ai code

Why AI Makes Testing Non-Negotiable

Here's the pattern that bites developers:

  1. You describe a feature to the AI
  2. It generates 80 lines of service code
  3. You read it. It looks right.
  4. You merge it.

The problem: you just made a production decision based on "it looks right." That is exactly what AI is optimized to produce — output that looks right. The model has no idea if your business logic is correct. It doesn't know your edge cases. It doesn't know what happens when a user sends an empty string where you expected a UUID.

Tests are the only reliable signal. A test doesn't care if the code looks clean. It runs the code against real assertions and tells you pass or fail. That's the only ground truth you have when you're generating code at speed.

The second problem is architectural. AI tends to produce tightly coupled code. It'll reach for the database directly from a controller, or wire a third-party API call into the middle of a service method. That code is hard to test, and hard to test usually means hard to change safely. TDD pressure — writing the test before the implementation — fights this. It forces you to think about interfaces first, which produces loosely coupled code almost automatically.


80% Coverage Is a Hard Gate, Not a Suggestion

I run 80% coverage as a non-negotiable CI gate. Below it, the build fails. The PR doesn't merge.

Why 80%? It's not a magic number — but it represents a meaningful floor. At 80% coverage, you've tested the major branches in your business logic, your primary API contracts, and your utility functions. You still have room for glue code, generated files, and bootstrapping logic that's genuinely hard to unit test. Going to 100% is often counterproductive: you end up writing tests for constructor injection and module imports.

Below 80%, you have too many untested code paths to safely ship AI-generated code. You're essentially trusting the model on all of them.

Set the threshold in jest.config.ts and let CI enforce it. This is the configuration I use across every TypeScript project:

// jest.config.ts
export default {
  coverageThreshold: {
    global: {
      branches: 80,
      functions: 80,
      lines: 80,
      statements: 80,
    },
  },
  collectCoverageFrom: [
    'src/**/*.{ts,tsx}',
    '!src/**/*.spec.ts',
    '!src/**/*.test.ts',
    '!src/main.ts',
  ],
};

This covers branches, functions, lines, and statements. If any of the four drops below 80%, Jest exits non-zero. The coverage collection explicitly excludes test files and main.ts — you don't want bootstrapping code inflating or deflating your real coverage signal.

Add two scripts to your package.json:

"test": "jest",
"test:coverage": "jest --coverage"

test runs fast in local development. test:coverage is what CI runs. That distinction matters for inner loop speed.


Setting Up the Testing Stack

Backend: NestJS

NestJS has first-class testing support via @nestjs/testing. The Test.createTestingModule() API lets you spin up isolated module contexts for unit tests and wire in real dependencies for integration tests.

Install the essentials:

npm install --save-dev jest @types/jest ts-jest @nestjs/testing supertest @types/supertest

Use supertest for integration tests against your controllers. It fires real HTTP requests against your NestJS app without needing a live server:

// users.controller.spec.ts
import { Test } from '@nestjs/testing';
import * as request from 'supertest';
import { AppModule } from '../app.module';

describe('UsersController (e2e)', () => {
  let app;

  beforeAll(async () => {
    const module = await Test.createTestingModule({
      imports: [AppModule],
    }).compile();

    app = module.createNestApplication();
    await app.init();
  });

  it('GET /users returns 200', () => {
    return request(app.getHttpServer()).get('/users').expect(200);
  });

  afterAll(async () => {
    await app.close();
  });
});

Frontend: Next.js

For Next.js, you have two solid options:

Jest + React Testing Library is the established choice. It runs in jsdom and gives you utilities for rendering components and asserting on what the user sees, not implementation details.

Vitest is worth knowing about. It's faster in watch mode, natively understands Vite's module system, and has a compatible API with Jest. If you're starting a fresh project, Vitest is increasingly the default. If you're in an existing Jest setup, the migration overhead usually isn't worth it yet.

For this series, I'm using Jest across backend and frontend. One test runner, one coverage report, one CI step.


Wiring Coverage Into the CI Pipeline

In the next article in this series, I walk through the GitHub Actions reusable workflow at fullStackDataSolutions/github-actions. That workflow calls test:coverage on every PR.

If coverage drops below the threshold in jest.config.ts, Jest exits with a non-zero code. GitHub Actions marks the step as failed. The PR is blocked. No override needed — the gate is already there.

What this means in practice: you cannot merge code that drops coverage below 80%. Not accidentally, not under deadline pressure, not because a reviewer didn't catch it. The machine catches it every time.

In Article 3 of this series, I cover SonarQube Cloud setup in detail. Once configured, it reads the LCOV report Jest generates and surfaces coverage metrics directly in the PR decoration. Configure it with:

sonar.typescript.lcov.reportPaths=coverage/lcov.info

This changes coverage from a CI pass/fail to something visible. You see the number on every PR. It's a forcing function — the number is right there, and it trends. If you're at 84% and a PR drops it to 81%, you see that before merging. Over time, you build coverage debt awareness into the team's default behavior.


The AI-TDD Loop (Do This, Not That)

The wrong way to use AI with tests:

  1. Describe a feature
  2. AI generates the implementation
  3. Ask AI to write tests for it
  4. Merge

Those tests verify what the code does. Not what it should do. If the AI got the logic wrong, the tests will confirm the wrong logic. You've created a false safety net.

The right loop:

1. Write a failing test that describes the behavior you want.

Be specific. "Given a user with role ADMIN, when they call DELETE /posts/:id, they should receive 200 and the post should be removed from the database." That's a test. "The delete endpoint works" is not.

2. Show the test to the AI: "Make this test pass."

Now the AI has a contract. It's not generating what it thinks you want — it's generating what has to be true to make an assertion pass.

3. Run the test.

If it passes, the implementation is verified against the behavior you specified. The AI output is correct — not because it looks right, but because it made the test green.

4. If it fails, the test surfaces exactly what's wrong.

Error messages from a failing test are the best debugging context you can give an AI. "Here's the test. Here's the error. Fix it." That loop is tight and productive.

5. Check coverage. Merge when it's green and above 80%.

This isn't just a mechanical process — it rewires how you think about AI coding. You stop asking "did the AI write good code?" and start asking "does this code satisfy a specification I wrote?" The first question has no reliable answer. The second does.


What to Test First

When you're starting from zero in an AI-assisted project, prioritize in this order:

1. Business logic and service layer. This is where AI errors are most consequential and most common. Your OrderService, your SubscriptionService, your pricing calculations — test these first. They're also the easiest to test in isolation because a well-written service is just functions that take inputs and return outputs.

2. API endpoints and controllers. Integration tests with supertest catch the wiring: routing, authentication guards, request validation, response shapes. These don't need high complexity — just confirm the contracts are right.

3. Utility functions. Easy to write, easy to run, good for building coverage momentum early in a project. Date formatters, string transformers, validators — these are fast wins.

4. Edge cases and error paths. AI almost always skips these in the first pass. Empty inputs. Rate limit handling. Invalid tokens. Null returns from external dependencies. Write tests for these explicitly — then ask AI to make them pass. You'll find real bugs.


Start Here Before You Write Feature Code

If you're starting a new TypeScript monorepo project:

  1. Install Jest, ts-jest, and the relevant testing utilities (@nestjs/testing, React Testing Library or Vitest for frontend)
  2. Write jest.config.ts with 80% thresholds and correct collectCoverageFrom patterns
  3. Add test and test:coverage scripts
  4. Confirm test:coverage fails with no tests (it should — zero coverage is below 80%)
  5. Write one test for one thing. Make it pass.
  6. Now you have a working test loop

Do this before you write any feature code with AI. The infrastructure should be in place before the code that needs verifying.


FAQs

What does "testing AI-generated code" actually mean in practice?

It means writing tests before you prompt the AI for an implementation. You define the expected behavior as a failing test, then ask the AI to write code that makes the test pass. The test is your specification; the AI generates the implementation; the test result is your verification. This is fundamentally different from asking AI to generate both the code and the tests — when AI writes its own tests, it typically verifies what the code does, not what it should do.

How do I set a Jest coverage threshold for TypeScript projects?

Add a coverageThreshold block to your jest.config.ts. Set branches, functions, lines, and statements to your target — I use 80% across all four. Use collectCoverageFrom to point Jest at your actual source files and exclude test files, generated code, and bootstrapping files like main.ts. Run jest --coverage to check; the build will fail if any metric drops below threshold.

What's the difference between Jest and Vitest for Next.js projects?

Both work well for testing React components in Next.js. Jest is more established with a wider ecosystem and integrates cleanly with ts-jest. Vitest is newer, faster in watch mode, and natively understands ES modules and Vite's build pipeline. If you're starting a fresh project using Vite under the hood, Vitest is increasingly the natural choice. For existing Jest projects, the migration effort rarely pays off in the short term. Either way, the test-first principles and coverage thresholds in this article apply equally to both.


Testing isn't what you add after the code is done. It's the foundation that makes the code safe to ship — especially when AI is writing it at speed. Set up the gate first. Then build.