[HN Gopher] Cingulata: Run C++ code over encrypted data with ful...
       ___________________________________________________________________
        
       Cingulata: Run C++ code over encrypted data with fully homomorphic
       encryption
        
       Author : p4bl0
       Score  : 158 points
       Date   : 2020-06-06 09:01 UTC (13 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | speedgoose wrote:
       | How encrypted and private is the encrypted data in such systems?
       | 
       | When I looked at encrypted databases, the real ones, not the
       | encrypted at rest databases, I read comments saying that the
       | crypto was relatively too weak to have any use outside research.
       | That it is a neat research topic, it will be great eventually,
       | but it's not ready for production.
       | 
       | So I went with the classic and simple solution : encrypt with
       | aes256gcm, and decrypt and reencrypt if I manipulate the data.
       | 
       | Does a system like cingulata offers encryption as strong or
       | better than aes256gcm?
        
         | bawolff wrote:
         | Note that encrypted db stuff generally tries to use weaker
         | notions of security in exchange for performance. Generally
         | research revealed that it was even weaker than the authors
         | thought. FHE is generally something different.
         | 
         | >Does a system like cingulata offers encryption as strong or
         | better than aes256gcm?
         | 
         | One of the security garuntees of aes-gcm is non malleablity.
         | Any attempt to modify the ciphertext is detected. This is the
         | key reason why GCM is popular over older methods like CBC. In
         | homomorphic encryption, the entire point is to be able to able
         | to do arbiteary computations on ciphertexts without decrypting.
         | So even theoretically it would be impossible for homomorphic
         | encryption to give same/better security properties as aes-gcm.
        
           | aruss wrote:
           | That's true that it's theoretically impossible, but you can
           | get close by using verifiable computation protocols like
           | Pinocchio to restrict the computations that the adversary can
           | do. That is to say that before blindly accepting the result
           | from the adversary and decrypting it, you first ensure that
           | they performed a computation you specified beforehand.
        
         | rhindi wrote:
         | All modern FHE is lattice based, so pretty strong if you chose
         | the right parameters. But of course if you dont chose secure
         | parameters.. well, it wont be secure :)
         | 
         | There are tools to measure the security level of FHE schemes:
         | https://bitbucket.org/malb/lwe-estimator/
        
           | speedgoose wrote:
           | I guess only experts should chose the parameters.
        
             | bawolff wrote:
             | FHE encryption is bleeding edge crypto. You probably
             | shouldn't use it in real systems with hard security
             | requirements at all without input from experts.
        
             | rhindi wrote:
             | You have formulas to calculate the security level given a
             | threat model, so the compiler could in theory do it
             | automatically. Just specify that you wants 128 bits of
             | security or whatever, and it will do the rest.
        
         | qayxc wrote:
         | Systems like Cingulata are true end-to-end encrypted systems.
         | No data is decrypted at any point in the process.
         | 
         | All data manipulation and -processing takes place on encrypted
         | data, as opposed to the encrypted database you mentioned, which
         | still decrypts its contents in memory prior to processing.
         | 
         | The reason homomorphic encryption is far from being ready for
         | production, is that all operations (e.g. all your algorithms
         | and programs) need to be transformed to a virtual circuit that
         | operates on cipher text encrypted by a specific algorithm.
         | 
         | This is akin to translating your software into an inefficient
         | byte code that's then dynamically executed by an obnoxiously
         | slow interpreter.
         | 
         | The great part is that you can simply encrypt your data on a
         | local (and trusted) machine, send the cipher text into the
         | cloud for indexed storage or processing and do your queries or
         | operations on encrypted data. At no point will your data ever
         | be decrypted on the remote machine.
         | 
         | So there's great potential there w.r.t. privacy and cloud
         | computing (and especially AI where training data is often the
         | "magic sauce" that gives your company an edge over the
         | competition) and SaaS.
        
           | jonahbenton wrote:
           | Super helpful explanation, +100.
        
           | speedgoose wrote:
           | Thanks for the explanation. The byte code analogy made me
           | understand the technology a lot better.
        
       | dekhn wrote:
       | During the COVID freakout, I opined that all the CPU/GPU spent on
       | Folding@Home was unlikely to create any sort of interesting
       | breakthrough and it used an absurd amount of CPU/GPU to not make
       | breakthroughs. In the past, I've been kind of opposed to using
       | FHE on health data (seems wasteful; instead, should be run on
       | trusted compute with a decrypt and encrypt key). But for the
       | amount of wasted compute in CPU/GPU for molecular dynamics...
       | coudl we instead process encyrpted health data on consumer
       | devices (I'm not aware of any health data analytics problems that
       | need to scale over millions of consumer devices).
        
         | lacker wrote:
         | For "big data" type tasks, the cost of shuffling around the
         | data to thousands of different consumer devices probably makes
         | it ineffective. You're better off just paying for one
         | centralized data warehouse, rather than all the custom coding
         | and decentralized bandwidth to make some consumer-based
         | strategy work. This seems more like it could be useful for
         | building things like decentralized encryption protocols.
        
       | xsamgreen wrote:
       | FHE is undergoing standardization right now. The leading
       | algorithm candidates are based on lattice crypto which is
       | currently quantum-safe: https://homomorphicencryption.org/
       | 
       | Here are some mature and actively-developed HE libraries:
       | 
       | https://github.com/homenc/HElib (By IBM, employed the inventor of
       | FHE)
       | 
       | https://github.com/Microsoft/SEAL (Best tutorials)
       | 
       | https://palisade-crypto.org/software-library (High-quality
       | codebase)
       | 
       | https://github.com/tfhe/tfhe (Fast boolean logic, used by
       | Cingulata)
       | 
       | Good introductory material is hard to come by, but I like the
       | first half of this talk: https://simons.berkeley.edu/talks/intro-
       | fhe-and-tfhe
        
       | hansdieter1337 wrote:
       | Reminds me of my work with CryptDB. A framework to run SQL
       | queries over encrypted data: https://github.com/CryptDB/cryptdb
       | 
       | One important fact about homomorphic and other encryption schemes
       | you can calculate on: It leaks information! E.g., if enc(x) +
       | enc(y) = enc(z), you gain the knowledge that x + y = z. With
       | enough data, it's easy to obtain the unencrypted data without the
       | secret(s).
        
         | aruss wrote:
         | I did part of a PhD in this field; your comment about
         | information leakage is not true. FHE schemes are IND-CPA
         | secure, which means you can't test for equality in the
         | encrypted space (Note that in the standard definition of
         | homomorphic encryption, you're only guaranteed that
         | Dec(Enc(x+y)) = Dec(Enc(x) + Enc(y)) = x + y). Your comment
         | only applies to schemes where you can test for equality in the
         | encrypted space, which are not IND-CPA secure and should never
         | be used in practice (except in some extreme scenarios where you
         | can/need to make some strong assumptions).
        
         | gautamcgoel wrote:
         | I think you meant homomorphic, not homophobic...
        
           | hansdieter1337 wrote:
           | ah, hehe, yes indeed. Thank you, auto-correct
        
       | asdfaoeu wrote:
       | 47 upvotes, no comments. I'm guessing no one understands this.
        
         | jacobush wrote:
         | Yep. Was hoping for someone to debunk or explain. My (probably
         | deeply flawed) hunch was that one would need some kind of
         | specialised virtual machine or something to make homomorphic
         | stuff. I'm babbling now.
         | 
         | Yes, someone please explain Cingulata.
        
           | qayxc wrote:
           | Cingulata is a system for secure, privacy preserving
           | applications based on homomorphic encryption.
           | 
           | Homomorphic encryption is a form of cryptography that allows
           | for operations on encrypted content to yield the same results
           | (though encrypted) as if applied to plain text input.
           | 
           | Basically, the framework takes algorithms implemented in C++
           | into a virtual boolean circuit that operates on encrypted
           | data. This means you can run your database, AI training, page
           | ranking, etc. on encrypted data for a truly end-to-end
           | encrypted processing or computation on untrusted devices or
           | environments (cloud computing!).
           | 
           | This comes at a price, though, as the virtual circuit is
           | basically a software interpreter for your original algorithm
           | and thus is abysmally slow compared to the original code...
           | 
           | At least that's my understanding of the system.
        
             | radres wrote:
             | So I can give an executable to anyone in the world and they
             | cannot get to my source code?
        
               | aruss wrote:
               | Not quite. The data is encrypted, the computation is not.
               | Through the use of universal circuits (circuits that run
               | other circuits) you could give an encrypted executable to
               | an untrusted party to run on encrypted data, but note
               | that the result of this computation is encrypted as well
               | (so, the executable would be useless for the other party
               | -- only you would be able to use the results). In fact,
               | this idea underpins the crucial idea behind FHE:
               | bootstrapping.
        
               | layoutIfNeeded wrote:
               | No.
               | 
               | You have a function F which maps inputs x to outputs y.
               | You transform this function into new one F', that will
               | map the encrypted inputs encrypt(x) to the encrypted
               | outputs encrypt(y), _without knowing how to decrypt_.
               | F(x) = y         F'(encrypt(x)) = encrypt(y)
        
               | russfink wrote:
               | This is correct. Additionally, the entity performing the
               | computation does not get to learn the answer. This is an
               | important distinction between a technique such as this,
               | and software obfuscation.
        
             | marius_k wrote:
             | Are these operations encrypted also? Is this method more
             | efficient rather than returning a mapping of all possible
             | unencrypted inputs and outputs for a given operation.
             | 
             | [edit] I just did the math. 1. mapping of adding two bytes
             | would make ~64kB list of values. 2. mapping of adding two
             | i32 would make ~73786976294838.2MB list of values (looks
             | like encryption would do a better job)
        
               | aruss wrote:
               | It's a requirement of these algorithms that they run in
               | polynomial time with respect to the size of the input.
               | They're large polynomials with big constants, but they're
               | dwarfed by the exponentials you're considering.
        
       | pimlottc wrote:
       | > Cingulata (pronounced "tchingulata")
       | 
       | As an English speaker, this makes me more confused about the
       | pronunciation, not less.
        
         | pmelendez wrote:
         | As a non-native English speaker living in a English speaking
         | country, I appreciate the phonetic guidance of a fairly
         | uncommon word
        
           | tzs wrote:
           | Did that really provide any phonetic guidance? All it appears
           | to say is that it is just like "cingulata" except the "c"
           | should be pronounced "tch".
           | 
           | As you note "cingulata" is fairly uncommon. I have no idea
           | how to pronounce that, and so their guidance still leaves me
           | in the dark about everything but the first letter.
           | 
           | For phonetic guidance something using the International
           | Phonetic Alphabet seems like the way to go. I think that
           | would give something like this: tSiNGgy@'lat@, -at@
           | 
           | That's just a guess taking the IPA for cingulata from
           | Merrian-Webster (siNGgy@'lat@, -at@) and replacing the "s"
           | with the "tS" that the Macmillan dictionary gives for the
           | "tch" in "latch", which is the first word that came to mind
           | that has a "tch".
           | 
           | (Also, I believe many publications use their own variants on
           | the IPA, so I'm not sure that what I gave is actually proper.
           | If MW and/or Macmillian have their own, the above could be
           | some weird bastardization with little meaning).
        
         | zimpenfish wrote:
         | Same. You can't just steal words[1] and change the
         | pronunciation!
         | 
         | [1] https://en.wikipedia.org/wiki/Cingulata
        
           | asjw wrote:
           | Cingulata comes from the Latin words cingula or cingulum and
           | it should be pronounced with the soft C (like in Tchaikovsky)
           | 
           | > History and Etymology for Cingulata New Latin, from Latin
           | cingulum, cingula girdle + New Latin -ata
           | 
           | https://www.merriam-
           | webster.com/dictionary/Cingulata#:~:text...
           | 
           | It's a common mistake English speakers do all the times
           | 
           | The rule is quite simple: a C followed by the vocals e and i
           | is always pronounced tch in Latin.
           | 
           | Latins used K for "hard C" like in "corn" and S for the s
           | sound like in "cent"
        
             | ReactiveJelly wrote:
             | If Tchaikovsky is Russian and the translation of his name
             | into written English is arbitrary and it's pronounced
             | "Chaikovsky", what is the "T" for?
        
               | asjw wrote:
               | That's also very simple: it's the transliteration of the
               | phonetic symbol _' tS_ which represents the soft C sound
               | as in cheese.
               | 
               | That's what happens when a language steals a foreign word
               | (Latin in this case) and changes its pronunciation
               | 
               | In Latin the way groups of letters are pronounced it's
               | (almost) unambigous, it's a phonetic alphabet itself
               | 
               | If you change the pronunciation, that part is lost and
               | you have to rely on recollection instead of recognition
               | 
               | It also means that if you don't already know a word you
               | can't be sure on how it is pronounced
        
               | otabdeveloper4 wrote:
               | The 't' is from French. That's how they code the sound in
               | the French language.
        
               | asjw wrote:
               | P.s. Tchaikovsky in Russian is Chaikovskii and it is
               | pronounced Cheekousky and not Chaikowsky
               | 
               | The correct transliteration is Cajkovskij and the correct
               | phonetic one is tcI'kofskjIj
        
             | zimpenfish wrote:
             | > a C followed by the vocals e and i is always pronounced
             | tch in Latin.
             | 
             | MW (from your link) says it's pronounced "siNGgy@'lat@"
             | which is definitely not "tchingulata", no matter what Latin
             | says.
        
               | asjw wrote:
               | Yes, in English
               | 
               | It also says that bus is pronounced 'b@s, but it's a
               | contraction of French omnibus, that comes from Latin
               | omnibus, which is pronounced omnibus (both in French and
               | Latin), like goose, but with a much shorter oo sound.
               | 
               | Anyway the point was is English language that stole words
               | from other languages and changed their pronunciation, not
               | the other way around.
        
             | gpderetta wrote:
             | C followed by i and e is soft (as in english ch) in
             | ecclesiatical latin and modern Italian, but my
             | understanding is that romans actually pronounced C as hard
             | C everywhere. I.e. Cicero called himself Kikero.
        
               | asjw wrote:
               | Correct, in ancient Rome C,K and Q had the same hard
               | sound
               | 
               | We are talking about classic Latin, born around the IV
               | century b.c.
               | 
               | But it changed a lot over the centuries influenced by the
               | Greeks, to the point that the Byzantin Roman empire chose
               | Medieval Greek as official language at the end of the VI
               | century and even before, their Latin was different from
               | the classic Roman one. I think they abandoned Latin as
               | official language in the third century to use Koine Greek
               | (sorry don't know the name in English) basically a Greek
               | dialect, the first common form of Greek, used among all
               | Hellenistic cities which eventually became modern Greek,
               | still in use nowadays.
               | 
               | Most of the Latin found in science (like in the animal
               | taxonomy) comes from "modern" ecclesiastic Latin which
               | incorporated traits of vulgar and neighbors languages
               | (French, German and Italian), it simplified the alphabet
               | but reduced the symbols available so C was mad an
               | affricate consonant.
               | 
               | In practice we have dealt with modern Latin or some form
               | of it for the past 15 centuries, longer than Rome
               | existed.
        
           | fabiospampinato wrote:
           | It's a slippery thing, how should the word "lead" be
           | pronounced? Did I mean lead as in leader or lead the metal?
           | In the English language words don't always have a unique
           | pronunciation associated with them.
        
       | entelechy wrote:
       | Great work! This allows to perform arbitrary computation on
       | untrusted devices.
       | 
       | However last time I checked computation in this scheme is
       | ridiculously slow: on modern machines, cutting edge
       | implementation of FHE manage to get around 100 integer operations
       | per second.
       | 
       | Never the less there have been some brave startups trying to
       | commercialise this technology:
       | 
       | https://venturebeat.com/2020/02/18/enveil-raises-10-million-...
       | 
       | Other interesting things build on top of FHE:
       | 
       | sql database where data and queries are fully encrypted:
       | https://github.com/zerodb/zerodb
       | 
       | fully encripted brainfuck vm: https://github.com/f-prime/arcanevm
        
         | rhindi wrote:
         | There are two main approaches to FHE: homomorphic boolean
         | circuits and homomorphic numerical processing.
         | 
         | In the former (eg Cingulata), you convert a program into a
         | boolean circuit, and evaluate each gate homomorphically. While
         | this is general purpose, it also means you decompose functions
         | that could be done in one instruction into multiple binary
         | operations (so very slow). That's usually what people refer to
         | when they say FHE is slow.
         | 
         | The other approach consists of operating directly on encrypted
         | integers or reals, and finding ways to do more complex
         | computations (like a square function) in one step. While this
         | is obviously much faster, it is also limited to whatever
         | operations is supported by the scheme. This is what people
         | refer to when they say FHE can only do certain things.
         | 
         | For years, the tradeoff has basically been slow and general
         | purpose, or fast and limited. But there are new scheme being
         | worked on that will be published soon that enable to go way
         | beyond what's currently done, such as doing efficient deep
         | learning over encrypted data and other complex numerical
         | processing.
         | 
         | Lots is coming out of labs and will be on the market within 2
         | years!
        
           | [deleted]
        
           | entelechy wrote:
           | ohh interesting! Are there any opensource implementations of
           | homomorphic numerical processing?
           | 
           | Were there any efforts of combining both approaches?
        
             | hedora wrote:
             | Note that most useful homomorphic numerical encryption
             | schemes are easily breakable. Once you have equality you
             | can usually de-anonymize user data. Many companies have
             | been burnt by this.
             | 
             | With a less than operator and the ability to encrypt chosen
             | plaintext values, you can decrypt arbitrary messages in a
             | linear (in message size) number of steps.
             | 
             | Arithmetic operations can often be used to build gadgets
             | that bootstrap comparison operations. For instance, with
             | addition and equality you can implement a comparison
             | operation for low-medium cardinality fields.
             | 
             | The field is littered with negative results that are being
             | sold as secure, practical systems. Be careful when using
             | them on important data.
        
               | bondarchuk wrote:
               | > _and the ability to encrypt chosen plaintext values_
               | 
               | Isn't this a big assumption? The way I envision it is
               | 
               | 1. client encrypts data with their key
               | 
               | 2. server computes on data without decrypting and without
               | needing the key
               | 
               | 3. client decrypts computation output with their key.
               | 
               | Or is it always required at step 2 that the server also
               | has the key needed for encryption (but not decryption
               | obviously)?
        
               | rhindi wrote:
               | The server doesn't need the decryption key, ever. Thats
               | the whole point in fact. FHE is end to end encryption for
               | compute. However there is sometimes a public key used,
               | called an evaluation key.
        
               | littlestymaar wrote:
               | > Isn't this a big assumption?
               | 
               | The standard resilience criteria for modern multi-purpose
               | encryption suppose that your scheme should be resistant
               | to adaptive chosen-cipher attack. Chosen plaintext is a
               | way weaker attack (the hierarchy being: known plaintext <
               | chosen plaintext < chosen cipher < adaptative chosen
               | cipher).
               | 
               | It may be OK for some situations, but it requires to be
               | much more cautious than with regular crypto (which is
               | already error-prone...).
        
               | Ar-Curunir wrote:
               | FHE schemes don't allow comparing equality of plaintexts
               | without decryption
        
               | rhindi wrote:
               | All FHE schemes today add tiny random noise to the
               | ciphertext so that encrypting the same data twice give
               | different results. The noise is then kept to a nominal
               | level as you compute homomorphically using a special
               | operation called bootstrapping. Then when you decrypt,
               | you just ignore the noise and get your result. If you do
               | that well and dont let the noise grow too big, you get
               | very strong security.
               | 
               | Fwiw, bootstrapping is actually what makes FHE slow, not
               | the actual addition/multiplication etc
        
               | lumost wrote:
               | Is there a good standards body to refer to for progress
               | on these alternate encryption methods?
        
             | rhindi wrote:
             | I suggest looking at TFHE and SEAL for starters
        
             | rhindi wrote:
             | Yes, turns out you can convert ciphertexts from one scheme
             | to another, so you can go back and forth between them
             | depending on what type of computation you are trying to do.
             | However the cost of transciphering is high, so in practice
             | it doesn't work well. But give it a few years and it'll
             | work!
        
             | bargle0 wrote:
             | HEAAN and HEmat are two libraries for numerical processing
             | that you can find on github. They're not perfect, and
             | require work to get in to shape for real distributed
             | computation.
        
         | Jyaif wrote:
         | Where did you get the "100 integer operations per second"?
         | 
         | I thought the speed was in the order of minutes for a single
         | operation.
        
           | entelechy wrote:
           | Indeed I didn't remember it correctly:
           | 
           | According to https://tfhe.github.io/tfhe/ states:
           | 
           | > Each binary gate takes about 13 milliseconds single-core
           | time to evaluate, which improves [DM15] by a factor 53, and
           | the mux gate takes about 26 CPU-ms
           | 
           | Addition of two bits can be implemented using 5 binary gates
           | (fulladder) Hence to add 2 32 bit numbers ~416ms => __2
           | additions per second __
           | 
           | EDIT: Shame I cannot edit my original post
        
       ___________________________________________________________________
       (page generated 2020-06-06 23:00 UTC)