hngopher.com

       [HN Gopher] Data Mesh Architecture
       ___________________________________________________________________
        
       Data Mesh Architecture
        
       Author : aiobe
       Score  : 71 points
       Date   : 2022-03-18 12:15 UTC (1 days ago)
        
 (HTM) web link (www.datamesh-architecture.com)
 (TXT) w3m dump (www.datamesh-architecture.com)
        
       | politelemon wrote:
       | Is there an underlying assumption here that all of the datasets'
       | domains are perfectly in sync with each other in the context of
       | domain metadata?
       | 
       | As an example, a Team1 might define the manufacturer of a
       | Sprocket as the company that assembled it, whereas a Team2 might
       | define the manufacturer as the company that built the Sprocket's
       | engine. Since the purpose of a datamesh is to enable other teams
       | to perform cross-domain data analytics, there needs to be
       | reconciliation regarding these definitions, or it'll become a
       | datamess. Where does that get resolved?
        
         | gxt wrote:
         | The chief data officer in close collaboration with the chief
         | data engineering officer must elaborate automated normalization
         | guidelines backed with implementations used across all data
         | streams to insure any skew in the data model is limited to non
         | production environments and all data entities are materialized
         | consistently across the whole data model.
        
           | i_like_waiting wrote:
           | what type of company you are working for? Usually there is
           | not even CIO, I haven't even heard about company with both
           | CDO and CDEO (or even CDEO itself).
           | 
           | I thought big portion of need that data mesh fills is the
           | organizations who are missing resources in their core BI
           | team.
        
             | tremoloqui wrote:
             | A data mesh approach probably wouldn't work in the sort of
             | organization you describe.
             | 
             | IMO - To make it work you need a consistent taxonomy or way
             | of translating from a particular domain to some sort of
             | interchange format.
             | 
             | If you have that then a set of centralized tools can pull
             | from the separate domains using a core set of protocols to
             | produce reports etc.
        
             | gxt wrote:
             | There's no magic. you need a core team that pivots from
             | writing code at O(n) cost enterprise wide to more or less
             | amortized O(1) where n is the amount of work required to
             | process a new data stream - ie having to write code once
             | per stream vs once for a standardized stream that gets
             | reused. With only datamesh I don't think it's going to work
             | but with standardized tools that allow your teams to write
             | transformations and code as data then every team
             | effectively gets access to a self-service data warehouse
             | with only access to pre-approved happy paths that can be
             | automatically monitored for the most part. That's where you
             | gain in efficiency and can let your BI teams focus on BI
             | and not boilerplate code, infrastructure, conformity, etc.
        
               | i_like_waiting wrote:
               | Yes, its similar path that I am taking (while leading BI
               | in my org.) Having first sights of self-service from
               | analysis perspective is super easy thanks to tools like
               | metabase.
               | 
               | For bringing data in, thats completely different story,
               | especially in non-tech organizations. The gap between how
               | power user from specific department and somebody from my
               | team brings and transforms data is still too big and
               | somehow hard to enforce (following naming conventions,
               | keeping same data formats for same columns, lowercasing
               | certain columns, so joins are done correctly...). They
               | usually have their "playground schemas" they use, but its
               | very far from saying that they "own" data quality there.
        
       | LaserToy wrote:
       | I looks like a weird attempt to build a consulting business
       | around a simple idea.
       | 
       | Treat data assets like micro services and pipelines like network.
       | Period.
       | 
       | Prescribing everything else rubs me wrong way.
       | 
       | So, data mesh is: architecture in which data in the company
       | organized in loosely coupled data assets.
        
       | robertlagrant wrote:
       | It really feels like data mesh is a fairly half baked concept
       | born out of short term consulting gigs and a desire to become a
       | technical thought leader.
        
         | i_like_waiting wrote:
         | Reminds me of first OLAP cubes a lot, something that consultant
         | online praise as much as possible, just so then 3-4 years later
         | they are contracted by the company to fix the mess it created.
        
           | edmundsauto wrote:
           | What are the downsides of OLAP cubes, and how were they
           | fixed? Curious to level up my understanding.
        
             | i_like_waiting wrote:
             | I guess they had their place in some point and time, but I
             | still vividly remember my old manager speaking about
             | building OLAP cube in 2018.
             | https://www.holistics.io/blog/the-rise-and-fall-of-the-
             | olap-...
        
       | i_like_waiting wrote:
       | So if I understand this correctly, data mesh is just data mart,
       | that doesn't bring data in database as a table, but uses S3
       | storage instead (I assume because thats cheaper in the cloud?)
        
         | skrrr wrote:
         | That + a central data platform team that provides infra,
         | quality monitors, data lineage and catalogue capabilities + a
         | central team that provides guidelines on SLAs, metadata
         | standards etc. Sounds good in theory, I am eager to see how it
         | fails in practice
        
       | mountainriver wrote:
       | This seems like mostly common sense. Infrastructure teams should
       | always be building tools that the org consumes (and ideally the
       | general public)
       | 
       | In a lot of orgs this goes sideways and the infrastructure teams
       | end up owning everything and never have time to do anything else.
       | Usually this happens due to upper management putting on the
       | squeeze.
       | 
       | In order for teams to actually own their infrastructure and data
       | we need better tooling to help them. This is coming along
       | nowadays but isn't fully there.
        
       | sdze wrote:
       | If you need so many "slides" to persuade your clients of
       | something, I think you lost already.
        
         | rad_gruchalski wrote:
         | Considering how many big companies go about implementing this
         | right now, I don't agree. C line likes slides.
        
           | MikeDelta wrote:
           | Indeed, the Future State Architecture documentation from the
           | central architects that I have seen were all powerpoint
           | presentations with at least 100 slides.
        
       ___________________________________________________________________
       (page generated 2022-03-19 23:00 UTC)