Continuous Integration Pipeline for 100+ Git Repositories

One hundred repositories. Four programming languages. Five different build systems. Getting them to integrate cleanly, continuously, without a dedicated integration sprint or a three-month feedback delay - that's the problem this article is about.

Benefit 1 - No Integration Sprints

CI ensures that the latest "main" is integrated into downstream "main" branches. It eliminates the "integration sprint" problem, that sprint spent stabilizing and adjusting to changes from upstream dependencies. Manual integration and issue identification on a large-scale project may not fit into a sprint. With CI, we distribute this effort using a "fail fast" model. You fix issues at the point in time and in the place where they occur.

Benefit 2 - Faster Feedback Loop

Consider a setup where Team A releases a library and Team B uses it. I'm in Team A, refactoring and making a performance improvement. Without CI, I wait until Team B integrates my version. That could be one month or three months, depending on their release cycle. And if my performance improvement contains a breaking change, I won't know about it for three months, by that time I'm already working on a different feature. I've forgotten the implementation details. The frustration from the disruption will be significant, and my motivation to go back and fix it won't be anywhere near what it was while I was actively developing it.

WITHOUT CI ────────────────────────────────────────────────────────────▶ time ↑ merge ↑ Team B integrates ↑ defect reported (month 3) (month 3+) "what did I even write here?"

WITH CI ────────────────────────────────────────────────────────────▶ time ↑ merge ↑ defect reported (next day) "I remember this, easy fix"

WITHOUT CI
────────────────────────────────────────────────────────────▶ time
↑ merge          ↑ Team B integrates        ↑ defect reported
                 (month 3)                  (month 3+)
                                            "what did I even write here?"

WITH CI
────────────────────────────────────────────────────────────▶ time
↑ merge  ↑ defect reported
         (next day)
         "I remember this, easy fix"

This is exactly the situation CI removes. Integrating a new feature the day after it's merged to master reduces feedback time from three months to one day. While you're still polishing implementation or documentation, new defects from downstream users aren't a source of frustration, they're the opposite. They're valuable feedback before promoting a feature as production-ready.

Benefit 3 - Testing for Backward Compatibility

When you have a large dependency graph across repositories and build units, backward compatibility practices become crucial. You don't want to disrupt downstream teams without reason, without notification, or when the benefits don't outweigh the cost of adjusting to breaking changes.

If breaking changes are unintentional, knowing about them immediately allows you to revert easily. Wait three months and it may be too late, multiple features and fixes have been stacked on top of that breaking change. Now a revert is risky. Now it's costly. And you'll avoid doing it.

But the problem isn't only technical. At that point, you or your manager has to admit you weren't preserving backward compatibility, that you don't have a sufficient testing harness catching breaking changes. The conversation degrades into debates about the importance of the change, or about what counts as public versus private API. Same issue, different timing, and timing changes everything.

Now that we understand the benefits and have made the decision to implement CI, it's important to understand how to implement it and what tradeoffs are acceptable. When something is this important and this deliberate, we call it an architectural decision. An Architectural Decision Record (ADR) is required, to investigate all options (context), document the decision, and communicate how it will impact the organization (consequences). Let's start with context.

Benefit 4 - Dependency Graph is Living Documentation

In complex and big systems, it is important to know your dependencies and your dependants. For example I’m pushing new API to master. Who will be impacted? You do not need to keep that sacred knowledge, how to find your dependants in static code search tool. You will know not only direct dependants but leaves also, which is important for the testing and acceptance phase of feature developement.

ADR - Context

The context already contains the details above: the number of repositories, the importance of CI, and the problems we currently face. I was implementing CI for 100 repositories with different programming languages (Scala, Java, JS, TypeScript), task runners (Maven, Gradle, Lerna, Turbo, Nx), and dependency managers (Yarn, npm, pnpm, Maven, sbt). Build, test, lint, and other tasks are implemented differently in each repository. Each repository contains one or more build units. A build unit's output may be an application Docker image, an artifact in an artifact repository, or uploaded documentation.

ADR - Decision

The decision was to build a generic pipeline, an integration engine that calculates the correct build order for all build units.

Order is derived from metadata defined in VCS. Each build unit declares its dependencies.
Each build unit defines how to update its dependencies via a script file that follows a shared convention.
Each build unit defines how and what to build to reach a successful pipeline state.
If a build fails with new dependencies, the build unit owner receives a ticket to investigate and fix the issue.

The result: a decentralized control system with a generic engine that calculates order and calls the required scripts with the required parameters.

ADR - Consequences

The technical consequences follow naturally from the context and decision.

1. All CI benefits described above.

2. Scheduled triggering over event-driven cascades.

We trigger the pipeline on a schedule rather than on every merge to main. Triggering on merge would create a cascade of integrations and overload the underlying infrastructure. A schedule gives us control over the integration rhythm without sacrificing the core benefit of short feedback loops.

3. Graph validation catches circular dependencies early.

Circular dependencies are validated at the graph calculation phase. If one is detected, the pipeline fails immediately. That said, circular dependencies can legitimately exist at the build unit level, a common case is a downstream unit being used as a testing dependency for integration testing. These are handled explicitly: such dependencies must be updated manually by the maintainer rather than automated by the engine. This is a deliberate tradeoff that keeps the engine simple while still supporting real-world testing patterns.

4. The metadata convention evolves with the engine.

The engine that interprets metadata is versioned. As the system evolves, the metadata format can evolve alongside it. In practice, we've introduced only additive changes so far, new properties with sensible defaults, so existing build units continue to work without modification. No breaking changes to the convention have been needed yet.

5. Repository owners become co-maintainers of the pipeline.

This consequence is not a technical one, it is organizational and sociological. Teams maintain the pipeline and feel responsible for its overall execution. They can tweak their scripts, add dependencies, and onboard or offboard a build unit without formal approvals or coordination with a central CI team.

This was welcomed. Teams supported the idea of being in control of their internal configuration, because it allowed them to evolve their development tooling and pipeline setup independently, without waiting on a central team or going through a bureaucratic change process. Autonomy here isn't just a nice-to-have, it's what makes the system sustainable at scale.

6. Tickets, not enforcement, as the social contract.

When a pipeline fails, a defect ticket is created and assigned to the build unit owner. This puts some pressure on the team, but leaves them the option of deciding when to address it. There is no hard block, no automated revert, no escalation chain. The ticket is the contract. It makes the failure visible and accountable without becoming a blocker that disrupts the team's own sprint commitments. In practice, this balance has worked, teams treat the ticket as a real obligation, not noise, because they're already invested in the pipeline as co-maintainers.

7. Highlits dead components

If a build unit stops being depended on, the graph shows it. Teams discover they're maintaining code nobody uses. Visualizing graph is very important with the continuos integration. Boxes with no arrows will be first candidates for review. Either graph edge is missing or component is stale.

The most important takeaway from this whole design is the last one. The technical decisions, the engine, the graph, the scripts, the scheduling, are solvable problems. The organizational design is where CI either succeeds or quietly rots. Giving teams ownership, keeping the contract lightweight, and making failure visible without making it punitive: that's what makes a CI pipeline for 100+ repositories actually run.