Microsoft Office migration from Source Depot to Git
265 comments
·June 12, 20252d8a875f-39a2-4
daemin
Some companies have developed their own technology like VFS for use with Perforce, so you can check out the entire suite of applications but only pull the files when you try to access them in a specific way. This is a lot more important in game development where massive source binary assets are stored along side text files.
It uses the same technology that's built into Windows that the remote drive programs (probably) use.
Personally I kind of still want some sort of server based VCS which can store your entire companies set of source without needing to keep the entire history locally when you check out something. But unfortunately git is still good enough to use on an ad-hoc basis between machines for me that I don't feel the need to set up a central server and CI/CD pipeline yet.
Also being able to stash, stage hunks, and interactively rebase commits are features that I like and work well with the way I work.
sixothree
Doesn’t SVN let you check out and commit any folder or file at any depth of a project you choose? Maybe not the checkouts and commit, but that log history for a single subtree is something I miss from the SVN tooling.
int_19h
You can indeed. The problem with this strategy is that now you need to maintain the list of directories that needs to be checked out to build each project. And unless this is automated somehow, the documentation will gradually diverge from reality.
gilbertbw
Can you not achieve the log history on a subtree with `git log my/subfolder/`? Tools like TortoiseGit let you right click on a folder and view the log of changes to it.
noitpmeder
My firm still uses perforce and I can't say anyone likes it at this point. You can almost see the light leaves the eyes of new hires when you tell them we don't use git like the rest of the world.
2d8a875f-39a2-4
Yeah it's an issue for new devs for sure. TFA even makes the point, "A lot of people felt refreshed by having better transferable skills to the industry. Our onboarding times were slashed by half".
tom_
Interesting to hear it was so much of a problem in terms of onboarding time. Maybe Source Depot was particularly weird, and/or MS were using it in a way that made things particularly complicated? Perforce has never felt especially difficult to use to me, and programmers never seem to have any difficulty with it. Artists and designers seem to pick it up quite quickly too. (By and large, in contrast to programmers, they are less in the habit of putting up with the git style of shit.)
Degorath
Can't say anything about perforce as I've never used it, but I'd give my left nut to get Google's Piper instead of git at work :)
Arainach
Piper's syntax is Perforce syntax.
I moved to Google from Microsoft and back when employee orientation involved going to Mountain View and going into labs to learn the basics, it was amusing to see fresh college hires confused at not-git while I sat down and said "It's Source Depot, I know this!"[1]
StephenAmar
I concur. I miss citc & fig.
kccqzy
I cannot believe that new hires would be upset by the choice of version control software. They joined a new company after so many hoops and it's on them for having an open mind towards processes and tools in the new company.
Marsymars
I feel like I’ve got an open mind towards processes and tools; the problem with a company using anything other than Git at this point is that unless they have a good explanation for it, it’s not going to be an indicator that the company compared the relative merits of VCS systems and chose something other than Git - it’s going to be an indicator that the company doesn’t have the bandwidth or political will to modernize legacy processes.
filoleg
> I cannot believe that new hires would be upset by the choice of version control software.
I can, if the version control software is just not up to standards.
I absolutely didn’t mind using mercurial/hg, even though I literally haven’t touched it until that point and knew nothing about it, because it is actually pretty good. I like it more than git now.
Git is a decent option that most people would be familiar with, cannot be upset about it either.
On another hand, Source Depot sucked badly, it felt like I had to fight against it the entire time. I wasn’t upset because it was unfamiliar to me. In fact, the more familiar I got with it, the more I disliked it.
DanielHB
I almost cried of happiness when we moved to git from SVN on my first job after being there for 6 months
They might not be upset on the first few weeks but after a month or so they will be familiar with the pain.
inglor
The problem is that you come to a prestigious place like Microsoft and end up using horrible outdated software.
Credit where credit is due at my time at Excel we did improve things a lot (migration from Script# to TypeScript, migration from SourceDepot to git, shorter dev loop and better tooling etc) and a large chunk of development time was spent on developer tooling/happiness.
But it does suck to have to go to one of the old places and use sourcedepot and `osubmit` the "make a change" tool and then go over 16 popups in the "happy path" to submit your patch for review (also done in a weird windows gui review tool)
Git was quite the improvement :D
jayd16
A craftsman appreciates good tools.
int_19h
Perforce is sufficiently idiosyncratic that it's kinda annoying even when you remember the likes of SVN. Coming to it from Git is a whole world of pain.
axus
If companies don't cater to the whims of the youth, they'd have to hire... old people
mattl
I worked with someone who was surprised the company didn’t use Bitbucket and Discord. They were unhappy about both.
socalgal2
VFS does not replace Perforce. Most AAA game companies still use Perforce. In particular, they need locks on assets so two people don't edit them at the same time and have an unmergable change and wasted time as one artist has to throw their work away
azhenley
I spent nearly a week of my Microsoft internship in 2016 adding support for Source Depot to the automated code reviewer that I was building (https://austinhenley.com/blog/featurestheywanted.html) despite having no idea what Source Depot was!
Quite a few devs were still using it even then. I wonder if everything has been migrated to git yet.
sciencesama
Naah still a lot of stuff works on sd !! Those sd commands and setting up sd gives me chills !!
PretzelPirate
I miss CodeFlow everyday. It was such a great tool to use.
hacker_homie
Most of the day to day is in git, now.
0points
Having used vss in the 90s myself, it surprised me it wasn't even mentioned.
VSS (Visual SourceSafe) being Microsoft's own source versioning protocol, unlike Source Depot which was licensed from Perforce.
chiph
VSS was picked up via the acquisition of One Tree Software in Raleigh. Their product was SourceSafe, and the "Visual" part was added when it was bundled with their other developer tools (Visual C, Visual Basic, etc). Prior to that Microsoft sold a version control product called "Microsoft Delta" which was expensive and awful and wasn't supported on NT.
One of the people who joined Microsoft via the acquisition was Brian Harry, who led the development of Team Foundation Version Control (part of Team Foundation Server - TFS) which used SQL Server for its storage. A huge improvement in manageability and reliability over VSS. I think Brian is retired now - his blog at Microsoft is no longer being updated.
From my time using VSS, I seem to recall a big source of corruption was it's use of network file locking over SMB. If there were a network glitch (common in the day) you'd have to repair your repository. We set up an overnight batch job to run the repair so we could be productive in the mornings.
0points
Oh, TIL! Thanks for adding that to the story.
Indeed my experiences of vss was also not amazing and certainly got corrupted files too.
EvanAnderson
> ...I seem to recall a big source of corruption was it's use of network file locking over SMB...
Shared database files (of any kind) over SMB... shudder Those were such bad days.
larrywright
I used VSS in the 90s as well, it was a nightmare when working in a team. As I recall, Microsoft themselves did not use VSS internally, at least not for the majority of things.
hpratt4
That’s correct. Before SD, Microsoft orgs (at least Office and Windows; I assume others too) used an internal tool called SLM (“slime”); Raymond Chen has blogged about it, in passing: https://devblogs.microsoft.com/oldnewthing/20180122-00/?p=97...
tamlin
Yes, I used VSS as a solo developer in the 90s. It was a revelation at the time. I met other VCS systems at grad school (RCS, CVS).
I started a job at MSFT in 2004 and I recall someone explaining that VSS was unsafe and prone to corruption. No idea if that was true, or just lore, but it wasn't an option for work anyway.
sumtechguy
The integration with sourcesafe and all of the tools was pretty cool back then. Nothing else really had that level of integration at the time. However, VSS was seriously flakey. It would corrupt randomly for no real reason. Daily backups were always being restored in my workplace. Then they picked PVCS. At least it didnt corrupt itself.
I think VSS was fine if you used it on a local machine. If you put it on a network drive things would just flake out. It also got progressively worse as newer versions came out. Nice GUI, very straight forward to teach someone how to use it (checkout file, change, check in like a book), random corruptions about sums up VSS. That checkin/out model seems simpler for people to grasp. The virtual/branch systems most of the other ones use is kind of a mental block for many until they grok it.
wvenable
I was mandated to use VSS in a university course in the late 90s -- one course, one project -- and we still managed to corrupt it.
marcosdumay
> No idea if that was true
It's an absurd understatement. The only people that seriously used VSS and didn't see any corruption were the people that didn't look at their code history.
smithkl42
I used VSS for a few years back in the late 90's and early 2000's. It was better than nothing - barely - but it was very slow, very network intensive (think MS Access rather than SQL), it had very poor merge primitives (when you checked out a file, nobody else could change it), and yes, it was exceedingly prone to corruption. A couple times we just had to throw away history and start over.
electroly
SourceSafe had a great visual merge tool. You could enable multiple checkouts. VSS had tons of real issues but not enabling multiple checkouts was a pain that companies inflicted on themselves. I still miss SourceSafe's merge tool sometimes.
mmastrac
We used to call it Visual Source Unsafe because it was corrupting repos all the time.
skipkey
As I recall, one problem was you got silent corruption if you ran out of disk space during certain operations, and there were things that took significantly more disk space while in flight than when finished, so you wouldn’t even know.
When I was at Microsoft, Source Depot was the nicer of the two version control systems I had to use. The other, Source Library Manager, was much worse.
meepmorp
iirc, we called it visual source shred
kinda nice to know it wasn't just our experience
RyJones
I was on the team that migrated Microsoft from XNS to TCP/IP - it was way less involved, but similar lessons learned.
Migrating from MSMAIL -> Exchange, though - that was rough
aaronbrethorst
Is that what inspired the "Exchange: The Most Feared and Loathed Team in Microsoft" license plate frames? I'm probably getting a bit of the wording wrong. It's been nearly 20 years since I saw one.
RyJones
Probably. A lot of people really loved MSMAIL; not so much Exchange.
I have more long, boring stories about projects there, but that’s for another day
canucker2016
And sometimes they loved MSMAIL for the weirdest reasons...
MSMAIL was designed for Win3.x. Apps didn't have multiple threads. The MSMAIL client app that everyone used would create the email to be sent and store the email file on the system.
An invisible app, the Mail Pump, would check for email to be sent and received during idle time (N.B. Other apps could create/send emails via APIs, so you couldn't have the email processing logic in only the MSMAIL client app).
So the user could hit the Send button and the email would be moved to the Outbox to be sent. The mail pump wouldn't get a chance to process the outgoing email for a few seconds, so during that small window, if the user decided that they had been too quick to reply, they could retract that outgoing email. Career-limited move averted.
Exchange used a client-server architecture for email. Email client would save the email in the outbox and the server would notice the email almost instantly and send it on its way before the user blinked in most cases.
A few users complained that Exchange, in essence, was too fast. They couldn't retract a misguided email reply, even if they had reflexes as quick as the Flash.
anonymars
Ha, maybe my old memory is rusty, but I feel like I recognize this name and you had an old blog with some quotable Raymond Chen -- one bit I remember was something like
"How do you write code so that it compiles differently from the IDE vs the command line?" to which the answer was "If you do this your colleagues will burn you in effigy when they try to debug against the local build and it works fine"
carlual
> Authenticity mattered more than production value.
Thanks for sharing this authentic story! As an ex-MSFT in a relatively small product line that only started switching to Git from SourceDepot in 2015, right before I left, I can truly empathize with how incredible a job you guys have done!
dshacker
Yeah, it was a whole journey. I can't believe it happened. Thanks for your comment.
carlual
Thank you! Btw, it reminds me of the book "Showstopper" about the journey of releasing Windows NT; highly recommended!
tux1968
Thanks for the recommendation! I was just about to reread "Soul Of A New Machine", but will try Showstopper instead, since it sounds to be the same genre.
hacker_homie
I spent a lot of time coaching people out of source depot, it was touch and go there for a while. It was worth it though thank you for Your effort.
danielodievich
I want to thank dev leads who trained this green-behind-the-ears engineer on mysteries of Source Depot. Once I understood it, it was quite illuminating. I am glad we only had a dependency on WinCE and IE, and so the clone only took 20 minutes instead of days. I don't remember your names but I remember your willingness to step up and help and onboard new person so they could start being productive. I pay this attitude forward with new hires here in my team no matter where I go.
BobbyTables2
I’d like to know when Microsoft internally migrated away from Visual SourceSafe…
They should have recalled it to avoid continued public use…
RandallBrown
I doubt most teams ever used it.
I spent a couple years at Microsoft and our team used Source Depot because a lot of people thought that our products were special and even Microsoft's own source control (TFS at the time) wasn't good enough.
I had used TFS at a previous job and didn't like it much, but I really missed it after having to use Source Depot.
jbergens
I was surprised that TFS was not mentioned in the story (at least not as far as I have read).
It should have existed around the same time and other parts of MS were using it. I think it was released around 2005 but MS probably had it internally earlier.
canucker2016
SLM (aka slime, shared file-system source code control system) was used in most of MS, aka systems & apps.
NT created (well not NT itself, IIRC, there was some an MS-internal developer tools group in charge)/moved to source depot since a shared file-system doesn't scale well to thousands of users. Especially if some file gets locked and you DoS the whole division.
Source depot became the SCCS of choice (outside of Dev Division).
Then git took over, and MS had to scale git to NT-size scale, and upstream many of the changes to git mainline.
Raymond Chen has a blog that mentions much of this - https://devblogs.microsoft.com/oldnewthing/20180122-00/?p=97...
int_19h
TFS was used heavily by DevDiv, but as far as I know they never got perf to the point where Windows folk were satisfied with it on their monorepo.
It wasn't too bad for a centralized source control system tbh. Felt a lot like SVN reimagined through the prism of Microsoft's infamous NIH syndrome. I'm honestly not sure why anyone would use it over SVN unless you wanted their deep integration with Visual Studio.
RyJones
USGEO used it in the late 90s, as well as RAID
pianoben
I don't know that they ever used it internally, certainly not for anything major. If they had, they probably wouldn't have sold it as it was...
Can't explain TFS though, that was still garbage internally and externally.
mattgrice
Around 2000? The only project I ever knew that used it was .NET and that was on SD by around then.
dshacker
I didn't even know Microsoft SourceSafe existed.
codeulike
We used it. We knew no better. It was different then, you might not hear about alternatives unless you went looking for them. Source Safe was integrated with Visual Studio so was an obvious choice for small teams.
Get this; if you wanted to change a file you had to check it out. It was then locked and no-one else could change it. Files were literally read only on your machine unless you checked them out. The 'one at a time please' approach to Source Control (the other approach being 'lets figure out how to merge this later')
rswail
Which is exactly how CVS (and its predecessors RCS and SCCS) worked.
They were file based revision control, not repository based.
SVN added folders like trunk/branches/tags that overlaid the file based versioning by basically creating copies of the files under each folder.
Which is why branch creation/merging was such a complicated process, because if any of the files didn't merge, you had a half merged branch source and a half merged branch destination that you had to roll back.
Disposal8433
The file lock was a fun feature when a developer forgot to unlock it and went on holidays. Don't forget the black hole feature that made files randomly disappear for no reason. It may have been the worst piece of software I have ever used.
pjc50
The lock approach is still used in IC design for some of the Cadence/Synopsis data files which are unmergable binaries. Not precisely sure of the details but I've heard it from other parts of the org.
namdnay
I remember a big commercial SCM at the time that had this as an option, when you wanted to make sure you wouldn’t need to merge. Can’t remember what it was called, you could “sync to file system” a bit like dropbox and it required teams of full time admins to build releases and cut branches and stuff . Think it was bought by IBM?
masklinn
Lucky you. Definitely one of the worst tools I’ve had the displeasure of working with. Made worse by people building on top of it for some insane reason.
TowerTall
I remember when we migrated from Visual Source Safe to TFS at my place of work. I was in charge of the migration and we hit errors and opened a ticket with Microsoft Premier Support. The ticket ended up being assigned to one of creators of Source Safe who replied "What you are seeing is not possible". He did manage to solve it in the end after a lot of head scratching.
mickeyp
Agreed. It had a funny habit of corrupting its own data store also. That's absolutely what you want in a source control system.
It sucked; but honestly, not using anything is even worse than SourceSafe.
moron4hire
It was at least a little better than CVS, but with SVN available at the same time, never understood the mentality of the offices that I worked at using Source Safe instead of SVN.
qingcharles
It was pretty janky. We used it in the gamedev world in the 90s once the migration to Visual C started.
b0a04gl
funny how most folks remember the git migration as a tech win but honestly the real unlock was devs finally having control over their own flow no more waiting on sync windows, no more asking leads for branch access suddenly everyone could move fast without stepping on each other that shift did more for morale than any productivity dashboard ever could git didn’t just fix tooling, it fixed trust in the dev loop
ksynwa
Not doubting it but I don't understand how a shallow clone of OneNote would be 200GB.
paulddraper
Must have videos or binaries.
bariumbitmap
> In the early 2000s, Microsoft faced a dilemma. Windows was growing enormously complex, with millions of lines of code that needed versioning. Git? Didn’t exist. SVN? Barely crawling out of CVS’s shadow.
I wonder if Microsoft ever considered using BitKeeper, a commercial product that began development in 1998 and had its public release in 2000. Maybe centralized systems like Perforce were the norm and a DVCS like BitKeeper was considered strange or unproven?
wslh
There was SourceSafe (VSS) around that time and TFVC afterwards.
AdamN
I feel like we're well into the longtail now. Are there other SCM systems or is it the end of history for source control and git is the one and done solution?
masklinn
Mercurial still has some life to it (excluding Meta’s fork of it), jj is slowly gaining, fossil exists.
And afaik P4 still does good business, because DVCS in general and git in particular remain pretty poor at dealing with large binary assets so it’s really not great for e.g. large gamedev. Unity actually purchased PlasticSCM a few years back, and has it as part of their cloud offering.
Google uses its own VCS called Piper which they developed when they outgrew P4.
zem
google also has a mercurial interface to piper
dgellow
Perforce is used in game dev, animation, etc. git is pretty poor at dealing with lots of really large assets
nyarlathotep_
I've heard this about game dev before. My (probably only somewhat correct) understanding is it's more than just source code--are they checking in assets/textures etc? Is perforce more appropriate for this than, say, git lfs?
int_19h
I'm not sure about the current state of affairs, but I've been told that git-lfs performance was still not on par with Perforce on those kinds of repos a few years ago. Microsoft was investing a lot of effort in making it work for their large repos though so maybe it's different now.
But yeah, it's basically all about having binaries in source control. It's not just game dev, either - hardware folk also like this for their artifacts.
masklinn
Assets, textures, design documents, tools, binary dependencies, etc…
And yes, p4 just rolls with it, git lfs is a creacky hack.
malkia
And often binaries: .exe, .dll, even .pdb files.
qiine
why is this still the case ?
rwmj
I've been checking in large (10s to 100s MBs) tarballs into one git repo that I use for managing a website archive for a few years, and it can be made to work but it's very painful.
I think there are three main issues:
1. Since it's a distributed VCS, everyone must have a whole copy of the entire repo. But that means anyone cloning the repo or pulling significant commits is going to end up downloading vast amounts of binaries. If you can directly copy the .git dir to the other machine first instead of using git's normal cloning mechanism then it's not as bad, but you're still fundamentally copying everything:
$ du -sh .git
55G .git
2. git doesn't "know" that something is a binary (although it seems to in some circumstances), so some common operations try to search them or operate on them in other ways as if they were text. (I just ran git log -S on that repo and git ran out of memory and crashed, on a machine with 64GB of RAM).3. The cure for this (git lfs) is worse than the disease. LFS is so bad/strange that I stopped using it and went back to putting the tarballs in git.
linkpuff
There are some other solutions (like jujutsu, which while using git as storage medium, has some differences in the handling of commits). But I do believe we reached a critical point where git is the one stop shop for all the source control needs despite it's flaws/complexity.
foooorsyth
git by itself is often unsuitable for XL codebases. Facebook, Google, and many other companies / projects had to augment git to make it suitable or go with a custom solution.
AOSP with 50M LoC uses a manifest-based, depth=1 tool called repo to glue together a repository of repositories. If you’re thinking “why not just use git submodules?”, it’s because git submodules has a rough UX and would require so much wrangling that a custom tool is more favorable.
Meta uses a custom VCS. They recently released sapling: https://sapling-scm.com/docs/introduction/
In general, the philosophy of distributed VCS being better than centralized is actually quite questionable. I want to know what my coworkers are up to and what they’re working on to avoid merge conflicts. DVCS without constant out-of-VCS synchronization causes more merge hell. Git’s default packfile settings are nightmarish — most checkouts should be depth==1, and they should be dynamic only when that file is accessed locally. Deeper integrations of VCS with build systems and file systems can make things even better. I think there’s still tons of room for innovation in the VCS space. The domain naturally opposes change because people don’t want to break their core workflows.
WorldMaker
It's interesting to point out that almost all of Microsoft's "augmentations" to git have been open source and many of them have made it into git upstream already and come "ready to configure" in git today ("conical" sparse checkouts, a lot of steady improvements to sparse checkouts, git commit-graph, subtle and not-so-subtle packfile improvements, reflog improvements, more). A lot of it is opt-in stuff because of backwards compatibility or extra overhead that small/medium-sized repos won't need, but so much of it is there to be used by anyone, not just the big corporations.
I think it is neat that at least one company with mega-repos is trying to lift all boats, not just their own.
kccqzy
Meta and Google both have been using mercurial and they have also been contributing back to upstream mercurial.
msgodel
git submodules have a bad ux but it's certainly not worse than Android's custom tooling. I understand why they did it but in retrospect that seems like an obvious mistake to me.
israrkhan
We did migrate from Perforce to Git for a fairly large repositories, and I can relate to some of the issues. Luckily we did not had to invent VFS, although git-lfs was useful for large files.
Always nice to read a new retelling of this old story.
TFA throws some shade at how "a single get of the office repo took some hours" then elides the fact that such an operation was practically impossible to do on git at all without creating a new file system (VFS). Perforce let users check out just the parts of a repo that they needed, so I assume most SD users did that instead of getting every app in the Office suite every time. VFS basically closes that gap on git ("VFS for Git only downloads objects as they are needed").
Perforce/SD were great for the time and for the centralised VCS use case, but the world has moved on I guess.