[HN Gopher] Launch HN: Datree (YC W20) - Best practices and secu...
       ___________________________________________________________________
        
       Launch HN: Datree (YC W20) - Best practices and security policies
       on each commit
        
       We are Shimon and Eyar, co-founders of Datree
       (https://www.datree.io). We've built software to help engineering
       teams automate the adoption of development best practices, coding
       standards, and security policies.  When I (Shimon) was the manager
       of a 400-developer company's infrastructure engineering team, we
       had an issue where a developer committed AWS secret keys into a
       public GitHub repo. We were very, very lucky that the bad actors
       who quickly got ahold of the keys "only" spun up compute instances
       to mine bitcoin.  Mistakes happen and they happen to the best of
       us. No developer wants to make mistakes, especially ones impacting
       production. Those mistakes can be not only costly to the business,
       but emotionally painful for the developer.  After finding out about
       the issue, I had to search for any other leaked secret in our
       repositories to make sure we were no longer exposed. The next thing
       that I had to do was to take steps to help folks avoid making this
       mistake again.  It's easy to create a policy that says "do not
       commit secrets to GitHub" (which was what I did) but in reality,
       this is much harder to implement. I would do things like sending a
       mass email to all of Engineering and having code reviewers check
       for it manually during code reviews. Problem is, these approaches
       don't work consistently--if at all.  The bigger the engineering
       team--and the faster it ships software--the bigger this problem
       becomes. Also, developers today operate more independently and have
       broader responsibilities; they are responsible for not just writing
       code, but also testing, and deployment to production. You might
       expect that developers would follow best practices, standards, and
       policies, but of course, in practice, these things fall through the
       cracks. That's why we built Datree.  What we built is a rules
       engine, which is essentially a server-side git-hook platform. We
       connect it to the organization's source control, scan the layout of
       the repository, parse all structured files like YAML / JSON / XML /
       Dockerfile, and build a catalog with the organization's metadata--
       such as packages used, container images, and all the properties in
       the structured files.  The engine performs an automatic check each
       time code is committed to GitHub. This happens before the code can
       be merged to master. It runs just like your CI tests. It checks if
       the rules you've set are followed--and tells the developer when
       they aren't and how to fix it, but not like your CI configuration,
       Datree is running on the org level so you can apply any rule on all
       of your repositories in just one click.  You may be asking "is this
       another static code analysis tool?" We see Datree as completing or
       complementing those tools, not competing with them. We're seeing
       our customers create a rule with Datree to check and verify that
       static code analysis step is integrated and executed as part of
       their CI flow, instead of going over each CI config file in their
       repositories and updating it manually.  Rules could be anything:
       development best practices, lessons learned from post-mortems,
       security policies, or compliance standards. For example, a very
       popular rule is to prevent secrets from being merged into the
       master branch. Leaking secrets to source control is a common and
       potentially costly mistake (see
       https://news.ycombinator.com/item?id=19825202).  Often people ask
       us, "what rules should we adopt?" Because of this, we started
       curating industry best practices and turning them into rules they
       can simply enable when they use our product. Datree now comes with
       more than dozens of rules packs for all kinds of popular
       technologies (like Docker and serverless), languages and
       frameworks, tools (like GitHub and Travis CI), and even use cases
       (like SOC 2 compliance). Of course, you are free to create your own
       custom rules.  To date, Datree has run 100,000+ checks for
       Engineering teams large and small, including Microsoft,
       Globalgiving, Cybereason, and Gigster (YC S15, 400+ engineers).
       We're sure many HN members will have encountered similar problems
       and/or have expertise in this area. We'd love to hear from you: How
       do you ensure the adoption of development best practices for your
       team? What works and doesn't? Thank you!
        
       Author : shimont
       Score  : 72 points
       Date   : 2020-03-10 15:26 UTC (7 hours ago)
        
       | vira28 wrote:
       | Wondering if there any open source project which does similar
       | things? (surprised if it's not)
        
         | __jal wrote:
         | Here's one for credential scanning:
         | 
         | https://github.com/dxa4481/truffleHog
        
         | eyarz wrote:
         | you will need to glue together (and maintain) a bunch of
         | different open-source projects to achieve the same capabilities
         | - here are some: https://github.com/danger/danger-js,
         | https://github.com/probot/probot,
         | https://github.com/Yelp/detect-secrets,
         | https://github.com/github/licensed, and many more...
        
         | devnullbyte wrote:
         | I developed this a while ago and it was used as gate check in a
         | Linux Foundation project. Its essentially the same as the OP
         | project (regex based), but without the GUI.
         | 
         | https://anteater.github.io/
        
       | cddotdotslash wrote:
       | This is really awesome! One area I'd recommend looking into is
       | automated scanning of cloud infrastructure templates (Terraform,
       | CloudFormation, Troposphere, ARM Templates, etc.) These get
       | pushed to source control all the time and often contain tons of
       | policy violations.
       | 
       | The pricing feels a bit steep, especially considering that it's
       | 3.5x the cost per user of GitHub itself ($8 vs $28) but I suppose
       | most enterprises wouldn't mind at their scale compared to the
       | cost of a breach.
        
         | eyarz wrote:
         | We do support Terraform, CloudFormation and ARM Templates
         | because those are all structured files. Unlike your source
         | control, we are doing heavy compute processing every time a new
         | PR created, so our costs are higher...
        
       | Chico75 wrote:
       | How do you deal with false positives?
        
         | shimont wrote:
         | Every policy can be edited and tweaked to your use case using
         | our engine and Regex. In addition, you can use our dashboard to
         | view all executions
        
       | ThePhysicist wrote:
       | Funny we built almost the same product 6 years ago (sold it to a
       | competitor), we even did automated refactoring of Python code. We
       | also developed a regex-like language that could operate on
       | abstract syntax trees / annotated graphs, which we wrote all our
       | checks with. We were working on extending that with a graph
       | database backend and symbolic execution, basically building a
       | large code graph that we would perform pattern matching on. We
       | didn't finish this work as we sold the company before that, in
       | retrospect I often wonder what would have happened if we had kept
       | developing it.
       | 
       | From my experience it's quite hard to monetize developer tools
       | except maybe when focusing on security, so it's good you seem to
       | have that as a focus as well. Good luck!
        
         | [deleted]
        
         | shimont wrote:
         | I would love to hear more about your experience! could you
         | please email me at Shimon [AT] Datree IO?
         | 
         | I believe that now is the right time for a solution like
         | Datree. I think so because of the way we develop software has
         | evolved, companies moved from Waterfall into Agile, there is
         | developer autonomy and the move towards distributed micro-
         | services has brought many companies to the reality of having
         | hundreds and thousands of git repositories, each one with its
         | own configuration files for CI, Docker, Kubernetes, etc.. its
         | really hard managing all of those distributed pieces :)
        
           | ThePhysicist wrote:
           | Done. Yes maybe the timing is right for you, we were probably
           | a bit too early when we launched our product!
           | 
           | Beware that Github recently acquired Semmle, they had one of
           | the best static analysis offerings and I think Github
           | acquired them to integrate their solution natively into the
           | platform, so I wouldn't focus exclusively on Github as they
           | might become your competitor very soon. In many enterprise
           | settings Gitlab seems to be the more popular choice already,
           | so it might be worth looking at integrating with that as
           | well.
        
       | harrisonjackson wrote:
       | >What we built is a rules engine, which is essentially a server-
       | side git-hook platform.
       | 
       | Isn't it too late once it is committed to github? It seems like
       | this would be much more useful as a service running as a
       | precommit hook on each workstation. Probably harder to
       | ship/monetize that but as far as actually solving the problem
       | wouldn't that be better?
        
         | ThePhysicist wrote:
         | As far as I know Github has their own technology for blocking
         | code pushes that contains e.g. AWS secrets. I think it happened
         | a lot that people pushed such secrets to the platform and
         | Github's streaming API makes/made it really easy for
         | adversaries to catch those and start abusing them within
         | seconds.
        
           | eyarz wrote:
           | you are right but blocking secretes is only a subset of what
           | our engine can do. also, as far as I know, the secret
           | blocking feature github is offering is ONLY for public repos.
        
         | ldthorne wrote:
         | Agreed that it's too late in that the key is compromised at
         | that point, but a notification before you release to production
         | that you've accidentally committed it allows you to roll the
         | key (hopefully) before any bad actors find it.
        
         | shimont wrote:
         | Initially, we started as a CLI tool, but as you said, it is
         | part of the problem, how do you make sure all of your
         | developers are using the CLI/pre-commit hooks?
         | 
         | This is why we choose to integrate on the pull-request level.
         | It is not perfect, but at least your plain text secrets will
         | not be merged into master and go in onto your developer's
         | laptops and your servers(less). :)
         | 
         | We try to find a balance between perfect and achievable in an
         | easy way for our customers
        
           | lasryaric wrote:
           | But it's already in the git objects and therefore accessible
           | to anyone who clones the repository? I am not 100% sure about
           | that. Can someone confirm?
        
       | toomuchtodo wrote:
       | Do you offer an on-prem version for orgs that couldn't use a SaaS
       | provider for this sort of functionality?
        
         | eyarz wrote:
         | yes, we do.
        
       | mtmail wrote:
       | The URL https://www.datree.io/
        
         | dang wrote:
         | Added above. Thanks!
        
       | theanirudh wrote:
       | Looks good. We were trying to implement this using a mix of CI,
       | pre-commit hooks and Gitlab PR templates, but it was limiting.
       | This looks like exactly what we needed.
       | 
       | Regarding custom rules, does the tool run automated tests for
       | those too?
        
         | eyarz wrote:
         | just to clarify, Datree runs automatic checks for each custom
         | rule that a user creates
        
         | shimont wrote:
         | Most companies we see try to implement something by themselves.
         | pre-commit hooks is a major problem as it requires all the
         | developers to install them on their computer, which is part of
         | the problem itself right? aligning the dev team :)
         | 
         | Your unit tests and integration tests should still run using
         | your CI. We run tests around the git and structured files
        
       | elpakal wrote:
       | >The engine performs an automatic check each time code is
       | committed to GitHub
       | 
       | What if we don't use GitHub but something else? Are you hooks
       | able to run purely in git?
        
         | shimont wrote:
         | Currently, we support GitHub and working on releasing our
         | support for GitLab and BitBucket. We plan on running on top of
         | existing git hosting solutions
        
       | debaserab2 wrote:
       | Any plans to expand to other VCS hosting services(bitbucket
       | specifically)?
        
         | eyarz wrote:
         | Yes, we plan to add GitLab and BitBucket support in the next
         | quarter.
        
       | dhagz wrote:
       | So this sounds like git-hooks-as-a-service. Am I right in that
       | assessment?
        
         | elpakal wrote:
         | that's my take as well but maybe im missing something
        
           | elpakal wrote:
           | i don't mean to diminish though, very cool idea!
        
             | eyarz wrote:
             | the git hook is only the integeration part. the core of the
             | system is the rules engine and the logic around that.
        
         | shimont wrote:
         | You might call it that in a way. This is where the engine is
         | hooking. We scan the entire source control and curate rules for
         | you to use around technologies
        
       | __jal wrote:
       | We do several of these things, and bundling them together looks
       | nice; I imagine troubleshooting the pipeline is much easier. We
       | would need the enterprise version because we are on-prem, and our
       | user count compared against the 'pro' edition makes me think this
       | would be a hard sell - high 5 figures/year to replace a few shell
       | scripts is tough.
        
         | eyarz wrote:
         | I believe we can provide more value than what can be achieved
         | with a few shell scripts, for example, the built-in best
         | practices, the rules management option, and more ;) A volume-
         | based discount is also available for enterprise customers. If
         | you would like to hear more - feel free to reach out to Eyar
         | [AT] Datree IO
        
       | almathes wrote:
       | Why wouldn't someone just use github actions and token scanning.
       | 
       | https://github.com/features/actions
       | 
       | https://developer.github.com/partnerships/token-scanning/
        
         | chaostheory wrote:
         | Not everyone uses github
        
           | shimont wrote:
           | We are actively working on supporting GitLab and BitBucket.
           | Once it is GA we will update you :)
        
         | __jal wrote:
         | For starters, that "just" is swallowing:
         | 
         | - Identify the relevant tokens you want to scan for, and create
         | regular expressions to capture them.
         | 
         | - Create a token alert service which accepts webhooks from
         | GitHub that contain the token scanning message payload.
         | 
         | - Implement signature verification in your token alert service.
         | 
         | - Implement token revocation and user notification in your
         | token alert service.
         | 
         | And that would replace one piece of what this does.
        
           | stevenpetryk wrote:
           | It always warms my heart to see someone fighting the "why not
           | just..." comments on here. Everyone underestimates how much
           | goes into a project.
        
             | dang wrote:
             | Jerry Weinberg used to say that whenever you hear the word
             | "just" on a software project, replace it with "have
             | trouble". Similarly, replace "should" with "isn't". "That
             | should be easy" -> "that isn't easy"; "we should just use
             | git" -> "we'll have trouble using git".
             | 
             | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&qu
             | e...
        
       | bluefox wrote:
       | Hello Shimon, nice to see your post here, all the best to you and
       | your team from an ex-colleague.
       | 
       | Would you mind sharing an example of a custom rule?
        
         | shimont wrote:
         | Hey :) Here are several examples of custom rules:
         | 
         | * Verify that CI configuration includes running certain jobs
         | (e.g. third-party packages scanner).
         | 
         | * Ensure that all Docker containers are using a pinned down tag
         | and not "latest"
         | 
         | * Verify that every commit is tied to an issue tracker (e.g.
         | JIRA) ticket for traceability.
        
       ___________________________________________________________________
       (page generated 2020-03-10 23:00 UTC)