Open source NPM packages seem to be developed in lots of small GitHub repos. Shouldn’t I do that?
Sure, if you’re building isolated components, and it’s not too important how they fit together. But business software doesn’t seem to work that way. It’s more like this:
Most people start out by building a single web application, not a bunch of libraries. After your application ships, it keeps growing in size. Then one day you need to share some code with a different project, and you realize you’ve got a big rat’s nest. Time to refactor!
…but working with 10 Git repos turns out to be a big pain! There are just so many headaches:
Tunnel vision: If a colleague mostly does their work in repos #5 and #6, they seem to completely ignore pull requests from the other 8 repos. New repos spring into existence every day without you even knowing about it.
Cascading publishing: Propagating a fix from lib3 down to your application project requires updating/building/publishing many Git repos in the right order: lib3 –> lib2 –> lib1 –> application. When lib3 has frequent churn, this becomes really tedious. How will people even remember the right order to publish? The internet has lots of bodies to throw at this problem, but you have limited people, and they’re very busy.
Downstream victims: When Bob publishes a change to lib3, it can take a while before all downstream projects get upgraded to use it. If there’s a regression, it might be a week before Alice tries to run “npm update” in lib1 and discovers the problem. By then, maybe Bob has left for his backpacking trip across Europe. Why should Alice shoulder the burden of fixing someone else’s regression? Seems like every time she upgrades, something is broken!
Linking madness: The workaround is to use npm link to symlink your application directly to lib3 for testing. But NPM creates symlinks via a global folder, which causes trouble if you need to work with multiple branches of lib3 on the same laptop. And with 10+ libraries, it’s hard to remember what is symlinked to what.
The “one repo per package” model makes sense for isolated projects that are maintained by uncoordinated strangers. (Also, most of those libraries get updated fairly infrequently, which makes the problem easier.) Whereas in our example, everyone works at the same company, and the “libraries” act more like components of an integrated architecture. Code gets churned a lot, and a change in one place can easily break another part of the system. Building multiple projects together lets you run all the unit tests for every change, which moves responsibility for fixes where it belongs: To the person who originally introduced the change.
The emergent principle becomes “one Git repo per team”, or even better “as few Git repos as possible to get the job done”.