[HN Gopher] Launch HN: JumpWire (YC W22) - Easily encrypt custom...
       ___________________________________________________________________
        
       Launch HN: JumpWire (YC W22) - Easily encrypt customer data in your
       databases
        
       Hi HN! We are William and Ryan, co-founders of JumpWire
       (https://jumpwire.ai), a security tool that encrypts columns of
       sensitive data stored in databases, in a way that works
       automatically with any backend application.  We've built startups
       (Ryan was Dir. of Engineering at N26) and worked in big tech
       (William was SRE at Spotify), and in every company, saw the same
       pattern of data spreading out of control. It felt like a dirty
       secret: hundreds of employees are granted access to customer PII
       through internal systems. Engineers responsible for securing data
       end up in a race against a growing list of SaaS or internal tools
       in use across the organization, and fall back to using bad access
       management workflows. Trying to secure data by controlling access
       is a risky proposition on its own -- data leaks due to compromised
       access have become an all-too-regular occurrence, e.g. Uber
       contractor breach in September as a recent example.  Companies that
       outgrow the access control approach typically do one of two things.
       Either developers have to write custom logic into all of their
       applications to encrypt/decrypt data, or they partition the data by
       putting some fields in a data vault and others in the main
       database. Both options are costly in terms of implementation work
       and ongoing maintenance. We've seen entire teams dedicated to just
       maintaining ETL pipelines for scrubbing PII into secondary
       databases!  JumpWire automates the encryption of data by
       identifying fields that contain sensitive information in databases
       and APIs. We do this without developers needing to modify their
       applications or manage access control rules. You define policies
       that determine how data should be handled--for a 'user' record,
       this might mean that the email address, name, and birthday are
       labeled as 'PII' and encrypted while signup date and favorite type
       of cheese are not.  JumpWire is a transparent proxy between
       applications and a database. The application connects using the
       same library it would if it was going directly to the database, and
       JumpWire intercepts and inspects queries before forwarding them on.
       Based on policies you define, individual fields can be
       encrypted/decrypted, nulled out, or audited as the requests and
       responses flow through the proxy. These policies are designed to be
       granular and map to table-specific schemas--for example, a policy
       might say to encrypt all PII, and the users table has a schema
       marking email address as PII. Different access controls can be
       applied to allow a subset of applications to bypass the policies
       where needed.  Because our proxies implement the underlying
       database protocols, application code or clients do not have to be
       changed to work with JumpWire.  The product is built to be self-
       hosted. The main component, our proxy engine, is run on your
       network as a cluster of Docker containers. The web interface is run
       by us by default but is also available to self-host. Our engine
       uses your own AWS KMS or HashiCorp Vault installation to store
       sensitive configuration data, such as database credentials and
       encryption keys. This ensures that confidential data is never
       transmitted across the Internet, and you remain in full control of
       the data infrastructure and keys. We do have a hosted Vault option
       as well to make it easy to get started or try things out in a
       staging environment.  Our database proxy supports PostgreSQL and
       MySQL/MariaDB, and DynamoDB is in beta. We also have an API proxy
       in early alpha that uses OpenAPI specs instead of connecting
       directly to a DB.  We actually built similar (but half-baked)
       versions of this at startups we were part of (a neobank and payment
       API), but it was always part of backend application code. We
       realized it could be abstracted out of the application entirely,
       and integrated via configuration instead. This would be easier to
       maintain, since application code wouldn't need updating each time
       data policies changed. However, building it was never feasible at
       these other companies, because it was too remote from their core
       products. So we decided to start JumpWire to do it.  We have a
       free-as-in-beer version of our product available to use in small
       environments. After that we charge a monthly subscription fee based
       on the number of databases or APIs configured.  We're still early
       and would love to hear what you think about what we're building, as
       well as any general thoughts on data security. Thanks!
        
       Author : hexedpackets
       Score  : 60 points
       Date   : 2022-12-01 16:08 UTC (6 hours ago)
        
       | paulgb wrote:
       | Congrats on the launch! This sounds pretty cool.
       | 
       | Did you have to get into the weeds of the wire protocols that
       | Postgres/Mysql use? What was that like?
        
         | debussyman wrote:
         | Indeed we did get into the weeds. PostgreSQL was fairly
         | straightforward, MySQL was a big challenge. Interestingly the
         | hard parts are supporting the large variety of authentication
         | handshakes that MySQL/Maria supports, not the queries
         | themselves. This is the fun part of our job! ;)
         | 
         | Also critical is ensuring encryption occurs within the database
         | transaction, so that data doesn't leak into write-ahead logs or
         | change data capture streams. Since we manage keys/rotation this
         | takes some careful logic in our engine.
        
       | yardstick wrote:
       | How easy is it to rotate encryption keys in the event of a
       | compromise? Eg a key was accidentally included in a log file, so
       | the data encrypted by that key now needs to be re-encrypted with
       | a new key.
        
         | hexedpackets wrote:
         | A manual rotation is one click on the web page, and we can
         | automatically rotate on a schedule to limit the scope of a
         | compromise if a key gets leaked. Full rekeying is Coming
         | Soon(tm) - fields encrypted with JumpWire have some metadata
         | about which key is used which makes it easier to find rows that
         | need to be re-encrypted, but the end to end process isn't
         | launched yet.
        
       | nox7777 wrote:
       | amazing! we've been looking for smth like this! just registered
       | via website
        
       | trafnar wrote:
       | I suppose your company in theory could read all the incoming
       | data? Could engineers at my company decrypt the data? Or are the
       | keys not available to us?
       | 
       | I suppose its more about ensuring the data sitting around in the
       | DB isn't exposed to random employees or hackers yeah?
        
         | iLoveOncall wrote:
         | Maybe read the post before commenting? They answer all your
         | points in it.
         | 
         | The proxy layer is self-hosted, the UI can be self-hosted and
         | the keys are your own AWS KMS keys.
        
         | hexedpackets wrote:
         | Our engine is self-hosted, so all of the data is kept local to
         | your network and we can't read any of it. Concerns about data
         | access and query latency are the two biggest reasons we decided
         | to take the self-hosted approach.
         | 
         | Whether engineers can access the keys and decrypt data depends
         | on your setup. The engine can use either AWS KMS or Vault for
         | top-level key management, so if an engineer has full
         | permissions over those then they could get the keys out. We can
         | also host the keys in our infrastructure and sync them over to
         | the engine if you're comfortable with that tradeoff.
        
       | ajnene wrote:
       | Amazing work guys! Excited to integrate this to shore up our
       | security practices
        
       | spak9 wrote:
       | FYI, I think there may be a typo on your
       | `https://jumpwire.ai/pricing` page on the `How are keys handled?`
       | 
       | ``` How are keys handled? We generate unqique encryption keys for
       | every account and store them in a secure secrets manager. Subkeys
       | are routinely created and rotated from the master key. For
       | additional security, we support user provided keys on our Team
       | and Enterprise plan. ```
       | 
       | `unqique` --> `unique`
        
         | hexedpackets wrote:
         | Thanks for letting us know, should be fixed in a minute!
        
       | dbochman wrote:
       | Dang this sounds awesome, really dig that clients won't require
       | changes to play nice
        
       | angryasian wrote:
       | I've worked with systems like this in the past. It becomes a huge
       | burden eventually when you have teams like marketing, analytics,
       | etc that need access to the raw data and you eventually have to
       | store all this stuff somewhere else unencrypted.
        
         | hexedpackets wrote:
         | Yeah, the mix of permissions can definitely be a big pain.
         | We're building with that in mind - policy exceptions can be set
         | so that specific groups of applications get the raw data when
         | querying. All of the policies stack too; one common setup is to
         | encrypt by default, then allow some specific tool to get raw
         | data but audit the queries it's doing.
        
       | gerad wrote:
       | Have you thought about solving the problem from a different
       | direction? Providing a read-only, sanitized clone of the database
       | that can be accessed outside of the core application code?
       | 
       | Seems like that could kill more birds with the same stone?
        
         | pistoriusp wrote:
         | That's exactly what we're doing at https://www.snaplet.dev, I
         | would love to chat with the founders about offering generated
         | production accurate snapshots for developers to code against
         | for users of their proxy!
        
           | [deleted]
        
           | debussyman wrote:
           | Happy to chat anytime! You can reach me by email (ryan at
           | [ourdomain]) or book directly on my cal -
           | https://calendly.com/ryan-jump/yc-founder-meeting
           | 
           | We've peeked at Snaplet in the past, and :heart: your design
           | aesthetic
        
             | pistoriusp wrote:
             | Thanks! I'm just about to catch a flight, but will reach
             | out when I get back to the land of the living!
        
               | debussyman wrote:
               | Sounds good, travel safe
        
           | nogenhat wrote:
           | Are you looking for investment? Happy to close the deal, love
           | that idea so much and trust the co-founders redwoodjs ;-)
        
             | pistoriusp wrote:
             | Unfortunately not anymore, we have a great set of people
             | backing us, but thanks for the vote of confidence.
             | 
             | As an aside: Not exactly sure why the parent is getting
             | down voted.
        
               | hodgesrm wrote:
               | Yeah, that's wierd. I just upvoted you to compensate.
        
               | pistoriusp wrote:
        
               | dang wrote:
               | It's common for startups to hijack competitors' launch
               | threads. Some readers find that distasteful; perhaps
               | that's why there were downvotes.
               | 
               | I'm not saying that your post was such a hijack, but it's
               | difficult to interpret these things accurately, so any
               | post of this kind will always land on a spectrum of
               | responses.
        
         | acrefoot wrote:
         | Tonic.ai seemed to fit that bill, but we ended up rolling our
         | own ETL job due to cost concerns, and some security preferences
         | for a simple to audit tool to do this. tonic.ai does it on-the-
         | fly, which was merely a nice-to-have for this use case.
        
         | hexedpackets wrote:
         | We have thought about that! It's a nice approach for some use
         | cases but having just a read-only copy ends up being pretty
         | limiting. Often people using internal tools (particularly
         | customer success) needs to modify some fields in a record but
         | shouldn't have unrestricted access to everything. We've found
         | that being able to protect specific fields instead of the
         | entire database gives a lot more flexibility.
        
       | lobal wrote:
       | Nice! I've been using a plugin [1] for Prisma that does something
       | similar, but this sounds much more comprehensive.
       | 
       | [1] https://github.com/47ng/prisma-field-encryption
        
         | hexedpackets wrote:
         | Thanks! The plugin is pretty nice if you're sticking to just
         | Prisma for your backends. Always happy to chat about your use
         | case or give a demo of how JumpWire compares if you're
         | interested.
        
       | aharm wrote:
       | This sounds great, but I'd really prefer a fully-hosted solution.
       | Do you offer one?
        
         | debussyman wrote:
         | We can launch the engine into VPC we manage that is co-located
         | in your region/AZ, and peer the networks, instead of offering a
         | traditional multi-tenant hosted solution.
         | 
         | But we try _really_ hard to ensure your data is never exposed
         | to the Internet. And we do everything we can to limit our
         | ability to read your data, either through self-hosting or
         | ensuring you own the keys.
        
       | bironran wrote:
       | How did you solve range queries? Prefix/suffix queries? Index
       | performance? Aggregation on database end?
        
         | hexedpackets wrote:
         | The short answer is we haven't fully solved it yet. We have two
         | modes we can operate in - directly encrypting in the database,
         | or doing just-in-time encryption as the query results come
         | back. For the former most queries other than direct comparison
         | won't work - we have some early work started on using both
         | homomorphic encryption [1] and format-preserving masking to
         | help there.
         | 
         | With JIT response encryption none of that is an issue, but it
         | can be slow for large amounts of data. Any kind of big-data
         | analytics will be a poor fit for JumpWire right now.
         | 
         | [1] https://en.wikipedia.org/wiki/Homomorphic_encryption
        
           | bironran wrote:
           | yeah, FHE isn't, yet, something that can be used in a busy
           | production env. At best I'd say it's a specialized tool,
           | though in my mind it's a toy solution - can work for n=1,
           | possibly for n<(low numbers) but not for large N.
        
             | hexedpackets wrote:
             | Totally agree. We're likely going to implement a partially
             | homomorphic solution that allows for some specific queries.
             | We aren't trying to build a general purpose computational
             | environment as the use cases we support don't require
             | arbitrary computation. The data people encrypt with
             | JumpWire is pretty much all strings and the queries on them
             | are mainly doing some sort of substring matching (mainly
             | prefix/suffix).
        
       | brap wrote:
       | Cool product. Just curious, is there no existing encryption at
       | the DB level? I would expect modern DBs to be able to do that.
        
         | hexedpackets wrote:
         | Thanks! Some databases have encryption support but it is either
         | coarse (row-level encryption is offered in a few databases for
         | example) or it's a low level construct that becomes really
         | complex to integrate - especially if you want to seamlessly
         | decrypt some data. They're often only available in enterprise
         | versions (MongoDB and MySQL do this).
         | 
         | pgcrypto mentioned below is a good example. It's a great
         | extension that works really well, and if you're only using
         | PostgrSQL you could build a lot of the functionality of
         | JumpWire using it. But it requires a lot of engineering work to
         | fit into your application. Having the basic encryption
         | functions only gets you part of the way to a full solution -
         | the rest is aligning those with high level policies and keeping
         | up to date as data schemas change.
        
         | ushakov wrote:
         | https://www.postgresql.org/docs/current/pgcrypto.html
        
       | liushh wrote:
       | Great work guys! Looking forward to integrating with JumpWire!
        
       | danielmarkbruce wrote:
       | Looks great. How do you guys compare to something like Voltage?
        
         | hexedpackets wrote:
         | The value prop is definitely very similar. I'm not as familiar
         | with Voltage as I am with other solutions, but my understanding
         | is that it requires either using the Voltage database driver
         | (JDBC/ODBC in particular) or an HTTP API.
         | 
         | With JumpWire, all of the works happens in an engine proxy that
         | works directly with the database protocols. That makes the
         | integration simpler - any language and connector can be used by
         | just changing the hostname and auth. The downside is it's
         | harder for us to add new databases - Voltage's approach
         | definitely wins out there.
        
           | danielmarkbruce wrote:
           | Interesting. Thanks for the explanation.
        
       | hangonhn wrote:
       | So if the fields are encrypted by the proxy on the way to the DB,
       | how do queries and indices work since it would be pretty much
       | invisible to the DB and the query planner? Thanks!
       | 
       | I really like the approach you are taking since it could be a
       | quick drop-in deployment that solves a huge problem for us.
        
         | hexedpackets wrote:
         | Glad to hear you like our approach! We haven't fully solved
         | indexing/complex querying yet. We have two modes we can operate
         | in - directly encrypting in the database, or doing just-in-time
         | encryption as the query results come back. When encrypting
         | directly in the database most queries other than direct
         | comparison won't work. We have some early work started on using
         | both homomorphic encryption [1] and format-preserving masking
         | which opens up the ability to use other query operations.
         | 
         | With JIT response encryption none of that is an issue, the
         | database still has the raw data but applications are protected.
         | The downside is it can be slow for large amounts of data.
         | 
         | [1] https://en.wikipedia.org/wiki/Homomorphic_encryption
        
       | danbmil99 wrote:
       | Any plans to support mongodb?
        
         | debussyman wrote:
         | We do plan to support MongoDB! Right now we are wrapping up
         | DynamoDB, and mongo is next after.
         | 
         | We are hoping to leverage their recently released Queryable
         | Encryption feature [1], but the key management is tricky.
         | 
         | [1] https://www.mongodb.com/products/queryable-encryption
        
           | danbmil99 wrote:
           | 15 yrs experience with Mongo - if you want some help
           | (contract work) please contact me dan.miller at eye0.com
        
             | debussyman wrote:
             | Thanks, I'll reach out next week!
        
       | cloudfalcon wrote:
       | What's the risk to your business of other data security companies
       | (like BigID) offering this kind of functionality?
        
         | debussyman wrote:
         | We see BigID and others in the data governance space focusing
         | on cataloging schemas and identifying risks around access to
         | data that violates policies. In cases where remediation
         | requires a technical change, such as tokenizing data before
         | sending to a third-party API, JumpWire offers a solution that
         | doesn't require engineering to re-architect their systems.
         | 
         | Of course BigID could build their own technical controls for
         | customers to install, but I'm seeing more partnerships
         | happening in the space - Cyera and Wiz recently announced a
         | tighter product integration [1].
         | 
         | There's also problems of offering a solution over SaaS. We
         | believe a proxy must run in our customers' network for low
         | latency, as well as the added security of data isolated to a
         | VPC.
         | 
         | [1] https://www.prnewswire.com/news-releases/cyera-and-wiz-
         | partn...
        
       | lyime wrote:
       | Congrats on the launch! Interesting product.
       | 
       | How are updates handled, if I'm hosting the container in my
       | cloud? How should I plan for troubleshooting if there are
       | incidents involving JumpWire?
        
         | debussyman wrote:
         | We tag releases for the container which gives you flexibility
         | to manage updates on your deployment schedule. In a production
         | setup, our proxy engine automatically clusters across multiple
         | nodes, so that rolling updates minimize downtime.
         | 
         | Policies are cluster aware, so that individual policies can be
         | pinned to a particular cluster.
         | 
         | For troubleshooting, our engine publishes events that you can
         | ship into your observability or monitoring stack
         | (datadog/statsd, prometheus, cloudwatch) so any degradation can
         | be handled by an IR process. And we support our customers with
         | quick responses on shared slack channels directly with their
         | engineering teams.
        
       | acrefoot wrote:
       | Any comparisons to https://www.tonic.ai?
       | 
       | > Based on policies you define, individual fields can be
       | encrypted/decrypted... Are the policies something like "retool"
       | gets tokenized or faked data back, and the main app gets
       | everything? Or is it more granular even within the main app? Like
       | can I teach JumpWire about my app's users and our AuthZ ruleset?
       | 
       | > or they partition the data by putting some fields in a data
       | vault and others in the main database I was considering using VGS
       | to tokenize sensitive data, but I prefer self-hosted and
       | reasonably auditable code for such sensitive systems. Is that the
       | case here?
       | 
       | > We've seen entire teams dedicated to just maintaining ETL
       | pipelines for scrubbing PII into secondary databases!
       | 
       | I do this to make staging environments more realistic, which
       | makes them double as debugging tools on production when you can't
       | give engineers any sort of direct production access. We whitelist
       | non-sensitive fields (most importantly foreign keys), and fill in
       | the rest with faked data. The app looks like production, but if
       | all the users were bots who were saying nonsense at each other.
       | At my scale (50 person company), it works reasonably well enough
       | with just me maintaining it.
        
         | hexedpackets wrote:
         | Tonic is awesome! We think of synthetic data/differential
         | privacy as a different use case - trying to replicate data
         | across scoped environments while preserving certain properties
         | or distributions of the entire data set. There is a
         | security/privacy component from scrubbing the data, but the
         | original data source is unmodified, and that's where we feel
         | risk lies. And the desired outcome isn't to add security but to
         | produce a data set that "looks like" the original well enough
         | for testing/modeling/analytics.
         | 
         | > Are the policies something like "retool" gets tokenized or
         | faked data back, and the main app gets everything?
         | 
         | Yep, that's exactly right. Application credentials are grouped
         | under classifications, and policies can be included/excluded
         | across classifications. We aren't passing authz through
         | JumpWire but for something like Retool you can configure it to
         | connect through different proxies for different users.
         | 
         | > I prefer self-hosted and reasonably auditable code for such
         | sensitive systems. Is that the case here?
         | 
         | Exactly. The engine which interacts with your data is almost
         | always self-hosted, and the web app also can be if needed.
         | 
         | > At my scale (50 person company), it works reasonably well
         | enough with just me maintaining it.
         | 
         | Makes sense! No reason to add more tools to your stack yet if
         | the custom process isn't too burdensome.
        
       ___________________________________________________________________
       (page generated 2022-12-01 23:00 UTC)