Skip to content(if available)orjump to list(if available)

Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix

jawns

For all the benefits, there is a large problem with this approach that often goes unacknowledged. It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.

The business contract with a consolidated data definition is that everyone in the business, no matter which domain, can rely on it. But think about the red tape that introduces. Whenever you need to define or update a data definition, now you don't have to think just about your own use case, but about all of the potential use cases throughout the organization, and you likely need to get sign-off from a wide variety of stakeholders, because any change, however small, is by definition an org-wide change.

It's the data form of the classic big-org problem, "Why does it take two months to change the color of a button?"

Granted, in most cases, having data definitions duplicated, with the potential for drift, is going to be the more insidious problem. But sometimes you just want to get a small, isolated change out the door without having to go through several levels of cross-domain approval committees.

jfengel

I tried, for some time, to develop a product designed to solve this. It would have made it easier to specialize models locally while complying with the corporate one. (Basically, beefing up the data definition language to something like prolog, and putting real thought into making the corporate model reality-based rather than just what suits your current requirements.)

Unfortunately it came about at exactly the same time as NoSQL and Big Data, which are basically the opposite. They let you be really loose with your model, and if some data gets lost or misunderstood, hey, no biggie. It's easier to patch it later than to develop a strong model to start with.

But am I bitter about it? No, why do you ask? Twitch, twitch.

bertails

UDA embraces the duplication of models: it's a fact of life in the enterprise. That is why "domains" are first-class citizen. We believe that good discovery capabilities will increase reusability of the domain models. Our next article will dive more into the extensibility capabilities of the metamodel Upper.

bertails

> It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.

Yes it is a "fundamentally a business problem" but we believe it can be solved with technology. We think we have a more systematic way to adopt and deploy model-first knowledge graphs in the enterprise.

> But think about the red tape that introduces.

We are very intentional about UDA not becoming more red tape. UDA lives alongside all the other systems. There will never be a mandate for everything to be in UDA.

But we sure want to make it easy for those teams who wants their business models to exist everywhere, to be connected to the business, and to make it easy to be discovered, extended, and linked to.

(I'm one of UDA's architects.)

datadrivenangel

How can it be universal if everything isn't in UDA?

bravesoul2

Spitballing. Another way to deal with the problem is like what would you do if you had billions of pieces of unstructured data (except for maybe the data being somewhat XML like) and you don't control any of it but you need to make sense of it as (ignoring rounding errors) your only business concern. That company is Google of course.

Maybe let the business units be loose but make the sense making central. Any individual unit can eventually tidy things up (SEO!) but everything will work regardless. The UX effect might be you can't find something decent to watch but that is an entirely different problem solved by not using Netflix and going to the theatre!

citizenpaul

IME it often comes down to "big men" issues where someone important wants the data in a certain way that is not logical or consistent so they won't let the "tech people" simple take the data and present it in a way that is logically consistent and follows best practices. They want to sit in meetings and create their own mental model monstrosity and force the devs to make it. Once that happens one time there is zero chance of the company ever having a consistent data model at any point in the future ever.

Not really a problem that can be overcome in probably 99% of companies. Lots of consultancy money to be made for the sake of ego and inflexibility though.

cush

> It is fundamentally a business problem, rather than a technical problem, but it has impact on development speed, so it's secondarily a technical problem.

It doesn't read from the article that they are denying that it's a business problem. The models they're defining seem to span all roles, engineering being only one.

wjnc

Data drift is real! I’ve recently restored sanity in a medium sized enterprise where there were three concurrent financial data flows. Including people not understanding each other, projects to find out ground truth and triple the workload in maintaining the dataflows. I’ve quipped to the team that endless summer is near. What if we only work on business relevant development. I would dream that the bigcorp we are part of would do the same. They are more of a tack on another Excel based solution kind of firm.

datadrivenangel

Data drift is real, and the yoke of governance chafes enough that new people insist on redoing your work in excel until the problem gets bad enough that a new data governance push is needed.

tomrod

Corolloray to Hyrum's Law then. Perhaps we call it "Orange is the New Model" Law

mkoubaa

Love it

Spooky23

The alternative is the same barriers, except with a parallel phone a friend governance model when you have to share data between verticals or programs.

It’s a classic pattern in public sector applications, where it’s partially deliberate.

Multicomp

It's been so long since the Semantic web and RDF and OWL and SKOS. I'm so glad they stuck with W3C and didn't reinvent those wheels. Will this UDA approach catch on? I don't know, but I hope so. It seems like it is trying to move the frontier of the difficulties of applying Domain Driven Design and semantic concepts to an enterprise company of significant scale.

If we can get compound interest across development teams by giving them a common toolset and skillset that covers different applications but the same data semantics, maybe not every data contract will have to be reduced to DTOs that can be POSTed or otherwise forced to be a least common denominator just so it can fit past a network or other IPC barrier.

For that, I'm grateful Netflix is working on this and publicizing the interesting work.

cpard

Reminds me of the work done at Uber with Dragon

https://www.uber.com/blog/dragon-schema-integration-at-uber-...

Unfortunately it never got open sourced but Joshua left for LinkedIn and started working on the LambdaGraph project and the Hydra language that are open sourced.

You can find more information on this fascinating work here:

https://github.com/CategoricalData/hydra

I think these approaches, including all the semantic web stuff from 10+ years ago, suffered from the added overhead of agreeing and formalising semantics and then of course maintaining them.

I wonder if LLMs can help with that part today.

frankdejonge

A bit unfortunate they used the term domain model here. Domain models here are purely data-centric, whereas domain modeling focuses mainly on behavior, not underlying data structures. The data that is used in domain models is used to facilitate the behavior, but the behavior it the code focus.

From a modeling perspective, there is certainly inherent complexity in representing data from domain models in different ways. One can argue though that this is a feature and not a big. Not the same level of nuance and complexity is needed in all of the use-cases. And representational models usually are optimized for particular read scenarios, this seems to mandate argue against that, favoring uniformity over contextual handling of information. It will most likely scale better in places where the level of understanding needed from the domain model is quite uniform, though I have seen most often that use-cases are often complicated when they do not simplify concepts that in their code domain model is very complex and nuanced.

majormajor

I'm curious if anyone has seen business improvements along the lines of "this let us discover something that led to 5%+ or >$5M improvements" (percent or absolute depending on how big the company is) from these kinds of efforts?

I've been in a couple of the "we need to unify the data tables to serve everyone" exercises before decided to focus on other parts of the software stack and a lot of it just seemed like "the video game people model it differently because they're doing different analysis, and if you unify the base layer to support everybody's type of analysis, it's not going to change that there's still a bunch of independent, not-talking-to-each-other analysis going on." (This is specifically different from the much LARGER sort of problem which is more a copypasta one - Finance's accounting doesn't agree with Legal's accounting and nobody knows who's right, which is one dataset needed in multiple places, vs multiple datasets needed in different places.)

I think this mostly sidesteps that - they aren't forcing everyone to migrate to the same things, AFAICT - and is just about making it easy to access more broadly. Is that right?

And confusion-reducing definition things - "everyone uses the same official definitions for business concepts" - I'm all for. Seen a lot of that pain for sure.

RobinL

> "the video game people model it differently because they're doing different analysis, and if you unify the base layer to support everybody's type of analysis, it's not going to change that there's still a bunch of independent, not-talking-to-each-other analysis going on"

This resonates. Moreover, it's very easy for architects to assume that because different areas of the business use data about the 'same' thing, the thing must be the same.

But often the analysis requires a slightly different thing. Like: we want a master list of prisons. But is a prison a building, a collection of prisoners (such that the male prison and the female prison on the same site are different prisons), or the institution with that name managed under a particular contract?

bertylicious

How does this relate to domain-driven design? It seems to be at odds with it, because in DDD it's kind of expected that the same concept will be represented in a different way by each system? But to be honest, I didn't read the whole blog post because of the UML vibes.

bertails

> How does this relate to domain-driven design?

The "Domain" in `upper:DomainModel` is the same D as in DDD (Domain-Driven Design) as the D in DGS (Domain Graph Service).

> in DDD it's kind of expected that the same concept will be represented in a different way by each system

In UDA, those concepts would explicitly co-exist in different domains. "Being the same" becomes a subjective thing.

regularfry

It doesn't. It's a blessing that they avoided the term "ubiquitous language" because that's almost exactly the dual of this concept, although people who have only ever heard the words and not dug any deeper won't know what the difference is.

bravesoul2

Why would Netflix engineering host on Medium? Very odd. And you just lose readers to the popups but you don't benefit from their discovery much either.

tough

Marketing department can own it

and SEO

bravesoul2

Makes sense for say a 30 person consultancy, but Netflix?!

Part of marketing is knowing your audience. And plenty of marketing people exist with deep tech experience.

yyhhsj0521

So they don’t have to maintain it themselves

echelon

> you don't benefit from their discovery

Sure you do.

And the types of engineers writing on Medium are the ones they want to recruit, too.

twodave

I wonder how they deal with versioning or breaking changes to the model. One advantage of keeping things more segregated is that when you decide to change a model you can do it in much smaller pieces.

I guess in their world they’d add a new model for whatever they want to change and then phase out use of the old one before removing it.

bertails

> I wonder how they deal with versioning or breaking changes to the model.

Versioning is permission to break things.

Although it is not currently implemented in UDA yet, the plan is to embrace the same model as Federated GraphQL, which has proved to work very well for us (think 500+ federated GraphQL schemas). In a nutshell, UDA will actively manage deprecation cycles, as we have the ability to track the consumers of the projected models.

twodave

That is a lot of subgraphs. Am I understanding correctly then that under UDA developers fulfill the UDA spec in whatever language they’re using, and then there’s some kind of middleware that will handle serving GraphQL queries? How are mutations represented? And how are other GraphQL-specific idioms expressed (like input parameters, nodes/edges/connections/etc.)? Is it just a subset of GraphQL that is supported?

I manage a much smaller federation where I work, and we have a lot of the same ideals I think in terms of having some centralized types that the rest of the business recognizes across the board. Right now we accomplish that within a set of “core” subgraphs that define these types, while our more product-focused ones implement their own sets of types, queries and mutations and can extend the core ones as it makes sense to.

zellyn

This is so meta it’s hard to ground it enough to understand any of it. I’d love to see a tiny worked example of, say, a billing department and an app that represent “Customer” in different ways, and how to model each, and most importantly, what practical benefits result. Ideally, the ideas of what a “customer” is would be different enough to differ in cardinality: e.g. in Billing, a customer is a business, and in the app, a customer is something that started out 1:1 with business but that broke down over time with co-ownership, assistants performing work for owners, etc.

null

[deleted]

heisenbit

I really believe a common vocabulary makes sense. But it is hard, very hard as you spread across organization (some to be bought and integrated), business processes and time. As soon as it comes to generating stuff things become hard. One may be able to generate interfaces between two systems but which enterprise has only two layers? Yes, if all knowledge is captured in the central catalog we may be able to do it but who builds this perfect database and maintains it?

Attempts to do this and survived either restricted themselves to being very abstract or limited their scope to specific use cases.

chiph

The problem I've seen is that you define your corporate entities, but then you have these systems in other divisions which need to extend it. Whether their division's special attributes get promoted to the corporate entity for everyone to use brings in politics and optimism. And making an update to a corporate-scoped entity then means you need solid change management.

IMO they can be very valuable in terms of reduced friction and costs, if you do it right and have enough rigor/discipline in the organization. Netflix might.

smarx007

> Attempts to do this and survived either restricted themselves to being very abstract or limited their scope to specific use cases.

Wikidata? 1.65 billion graph nodes and counting under a common vocabulary.

null

[deleted]