PR Risk Score Explained: How AI Decides If Your PR Is Safe to Merge

Every pull request that Agenticify reviews gets a risk score — a number from 0 to 100 — along with a risk level and an approval recommendation. This score helps your team instantly understand: how much attention does this PR need?

This isn't a vague "looks good" or "needs work." It's a structured assessment based on concrete factors. Let's break down how it works.

The Four Risk Levels

🟢 Low

Score: 0–30

Minimal risk. Documentation changes, config updates, simple refactors. Safe for junior review or auto-merge.

🟡 Medium

Score: 31–55

Moderate risk. New features with tests, standard CRUD operations, UI changes. Normal review process.

🟠 High

Score: 56–75

Significant risk. Database schema changes, external API integrations, complex business logic. Senior review recommended.

🔴 Critical

Score: 76–100

Severe risk. Security vulnerabilities, auth/payment changes, data loss potential. Approval not recommended without thorough review.

What Factors Determine the Score?

The risk score isn't a single metric — it's a weighted combination of multiple factors that the AI evaluates during review:

Factor	What It Measures	Weight
Security Impact	SQL injection, XSS, auth bypass, hardcoded secrets, insecure deserialization	Very High
Data Sensitivity	Touches payments, PII, database schemas, encryption, or access control	High
Change Scope	Number of files changed, lines added/removed, cross-module impact	Medium
Complexity	Nested conditionals, concurrency, state mutations, error handling depth	Medium
Test Coverage	Are new code paths covered by tests? Are existing tests modified?	Medium
Dependency Changes	New packages, version bumps, transitive dependency risks	Low-Med
Convention Compliance	Branch naming, commit messages, coding standards adherence	Low

Security impact is the heaviest factor. A PR that touches 2 files but introduces a SQL injection will score higher than a PR that touches 50 files but only changes CSS.

Real-World Examples

Example 1: High Risk (Score: 72)

PR #142 — Add user delete endpoint

⚠️ Score: 72/100 — NEEDS ATTENTION

// What the AI found:
🔴 SQL injection in user lookup (L28)
🔴 No authorization check — any user can delete any other user
🟡 Missing rate limiting on delete endpoint
🔵 Consider soft-delete instead of hard-delete

This PR only changes one file and adds 30 lines. But it introduces a SQL injection and has no authorization — both critical security issues. The scope is small, but the security impact drives the score up.

Example 2: Low Risk (Score: 12)

PR #156 — Update API documentation

✅ Score: 12/100 — SAFE TO MERGE

✓ No security concerns
✓ Documentation-only changes
✓ No logic modifications
🔵 Minor: Consider adding API versioning to docs

Markdown files changed, no code logic touched. The AI recognizes this is a documentation PR and scores it accordingly.

The Approval Recommendation

Beyond the numeric score, every PR gets one of three recommendations:

✅ SAFE TO MERGE — Low risk, no critical issues. Can proceed with standard review.
⚠️ NEEDS ATTENTION — Medium/high risk. Specific issues need to be addressed. Senior review recommended.
⛔ APPROVAL NOT RECOMMENDED — Critical issues found. Security vulnerabilities, data loss potential, or fundamental design problems. Should not be merged without significant changes.

This is an advisory, not a blocker. The AI doesn't prevent merging — it surfaces information so your team can make informed decisions. Think of it as a senior developer's gut feeling, quantified and consistent.

How Teams Use Risk Scores in Practice

Tiered Review Process

The most effective teams set up tiered review requirements based on risk:

Low (0-30): AI review + one team member approval. Can be merged quickly.
Medium (31-55): AI review + standard peer review. Normal flow.
High (56-75): AI review + senior developer review. Extra scrutiny on flagged issues.
Critical (76-100): AI review + tech lead review + address all critical issues. No shortcuts.

💡 Pro tip: Use GitHub branch protection rules to require additional reviewers when Agenticify's Check Run reports high/critical risk. This way the process is automated — you don't rely on people remembering to check the score.

Sprint Planning

Track average risk scores per sprint. If your team's average is creeping up, it might indicate:

PRs are getting too large (break them down)
Security awareness training is needed
Architecture decisions need revisiting

Developer Growth

Risk scores over time show developer growth. A junior developer whose average PR risk score drops from 45 to 25 over 3 months is demonstrating better security awareness and code quality — without anyone having to manually track it.

Customizing Risk Assessment

Every team has different risk tolerances. A fintech startup and a personal blog have very different security requirements.

Agenticify lets you customize risk assessment through custom AI prompts:

Increase sensitivity for payment code: "Flag any change to payment-related files as high risk regardless of scope"
Decrease sensitivity for generated code: "Treat files in /generated/ as low-risk documentation changes"
Add domain rules: "Any PR that modifies database migrations should be flagged as high risk"
Team standards: "PRs without tests for new endpoints should receive a higher risk score"

These prompts are configured per-repository, so your frontend app and backend API can have different risk profiles.

Risk Score ≠ Code Quality

One important distinction: a high risk score doesn't mean the code is bad. It means the PR needs more careful review.

A perfectly written payment integration will still score high — because payment code is inherently risky. A terribly written README change will score low — because the blast radius is minimal.

Risk scoring answers "how much damage could this cause if something is wrong?" — not "is this code good?"

The code quality feedback comes from the AI review comments (issues, suggestions, security findings). The risk score tells you how seriously to take those comments.

The Bottom Line

PR risk scoring turns code review from a binary "approved/not approved" into a spectrum. It helps teams:

Prioritize — Review critical PRs first, batch low-risk ones
Route — Send high-risk PRs to senior reviewers automatically
Track — Monitor risk trends across sprints and developers
Ship confidently — Merge low-risk PRs faster without guilt

It's not about slowing down. It's about knowing when to slow down and when it's safe to move fast.

See risk scoring on your PRs

Install the GitHub App. Open a PR. See the risk score instantly. No configuration needed.

Start free with GitHub