Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

PM for Git at Microsoft here. We explored splitting it up. It's a 300GB repository and it's been a monorepo for the last 20 years. Splitting it up logically would take a lot of time (to put it mildly) and development would stall while we did it. And once we did, we would have 100 3.5GB repositories? 3500 100MB repositories? Neither of these are particularly appealing and getting changes checked in atomically across multiple repositories is insanely challenging. There's no doubt that we would need to build tooling to make this work for us. (We did actually explore this direction, but ultimately decided that it would be too much work for too poor an experience.)

Instead, we decided to - as you put it - change Git to fit our needs.



Does Git on MSWindows use a dentry like cache in user land to speed up filename lookups? Does it use a stat() equiv cache? Is there a reason that such caches weren't put into ntoskrnl.exe? Would you be able to give us a brief list (say top 5) of changes that sped up git on windows? And which ones had the most effect on super sized monoreops. Thanks!


I have seen poor performance of git on Linux in very large repos, so I'm not super convinced that the dentry cache magically makes things better.

In particular, if my repo is big enough, I often don't have the entire tree in memory (because I'm doing other useful things with memory and caches got evicted). core.untrackedCache makes things a little better, but it's still not great.


What were you using before this?


We were using a mix of tools throughout Microsoft: several teams were using Source Depot, which is an internally developed centralized version control system. It was built to handle large teams, like Windows and Office. It was the precursor to Team Foundation Version Control, which is the centralized version control system available in TFS and VSTS, which is also capable of scaling to large projects, and many organizations within the company were (and are) using that.

GVFS is part of our effort within the company to standardize on a single set of best-of-breed tools, and use the same tools inside Microsoft that we deliver to customers. So we're adopting one engineering system throughout the company and we're moving everybody to Visual Studio Team Services.


It's my understanding that Source Depot was not internally developed but rather was a fork off Perforce which MS purchased a source license to some years back.


True. Source Depot originated as Perforce and is a fork; it had (at one time) a pretty large team of developers working on it and it has been heavily, heavily modified in the many years since.


Changing Git to fit your needs probably just makes this monorepo function on borrowed time. Fixing the symptoms with band-aids will work for now, but at some point, something will probably have to be done about this 300GB god-repo. Any insight in to what/when/if something is going to be done about that?


I'm not the right person to ask about Windows organization, I'm afraid. But I will say that the nice thing about moving to Git is that its lightweight branching makes refactoring much easier.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: