[HN Gopher] Cingulata: Run C++ code over encrypted data with ful... ___________________________________________________________________ Cingulata: Run C++ code over encrypted data with fully homomorphic encryption Author : p4bl0 Score : 158 points Date : 2020-06-06 09:01 UTC (13 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | speedgoose wrote: | How encrypted and private is the encrypted data in such systems? | | When I looked at encrypted databases, the real ones, not the | encrypted at rest databases, I read comments saying that the | crypto was relatively too weak to have any use outside research. | That it is a neat research topic, it will be great eventually, | but it's not ready for production. | | So I went with the classic and simple solution : encrypt with | aes256gcm, and decrypt and reencrypt if I manipulate the data. | | Does a system like cingulata offers encryption as strong or | better than aes256gcm? | bawolff wrote: | Note that encrypted db stuff generally tries to use weaker | notions of security in exchange for performance. Generally | research revealed that it was even weaker than the authors | thought. FHE is generally something different. | | >Does a system like cingulata offers encryption as strong or | better than aes256gcm? | | One of the security garuntees of aes-gcm is non malleablity. | Any attempt to modify the ciphertext is detected. This is the | key reason why GCM is popular over older methods like CBC. In | homomorphic encryption, the entire point is to be able to able | to do arbiteary computations on ciphertexts without decrypting. | So even theoretically it would be impossible for homomorphic | encryption to give same/better security properties as aes-gcm. | aruss wrote: | That's true that it's theoretically impossible, but you can | get close by using verifiable computation protocols like | Pinocchio to restrict the computations that the adversary can | do. That is to say that before blindly accepting the result | from the adversary and decrypting it, you first ensure that | they performed a computation you specified beforehand. | rhindi wrote: | All modern FHE is lattice based, so pretty strong if you chose | the right parameters. But of course if you dont chose secure | parameters.. well, it wont be secure :) | | There are tools to measure the security level of FHE schemes: | https://bitbucket.org/malb/lwe-estimator/ | speedgoose wrote: | I guess only experts should chose the parameters. | bawolff wrote: | FHE encryption is bleeding edge crypto. You probably | shouldn't use it in real systems with hard security | requirements at all without input from experts. | rhindi wrote: | You have formulas to calculate the security level given a | threat model, so the compiler could in theory do it | automatically. Just specify that you wants 128 bits of | security or whatever, and it will do the rest. | qayxc wrote: | Systems like Cingulata are true end-to-end encrypted systems. | No data is decrypted at any point in the process. | | All data manipulation and -processing takes place on encrypted | data, as opposed to the encrypted database you mentioned, which | still decrypts its contents in memory prior to processing. | | The reason homomorphic encryption is far from being ready for | production, is that all operations (e.g. all your algorithms | and programs) need to be transformed to a virtual circuit that | operates on cipher text encrypted by a specific algorithm. | | This is akin to translating your software into an inefficient | byte code that's then dynamically executed by an obnoxiously | slow interpreter. | | The great part is that you can simply encrypt your data on a | local (and trusted) machine, send the cipher text into the | cloud for indexed storage or processing and do your queries or | operations on encrypted data. At no point will your data ever | be decrypted on the remote machine. | | So there's great potential there w.r.t. privacy and cloud | computing (and especially AI where training data is often the | "magic sauce" that gives your company an edge over the | competition) and SaaS. | jonahbenton wrote: | Super helpful explanation, +100. | speedgoose wrote: | Thanks for the explanation. The byte code analogy made me | understand the technology a lot better. | dekhn wrote: | During the COVID freakout, I opined that all the CPU/GPU spent on | Folding@Home was unlikely to create any sort of interesting | breakthrough and it used an absurd amount of CPU/GPU to not make | breakthroughs. In the past, I've been kind of opposed to using | FHE on health data (seems wasteful; instead, should be run on | trusted compute with a decrypt and encrypt key). But for the | amount of wasted compute in CPU/GPU for molecular dynamics... | coudl we instead process encyrpted health data on consumer | devices (I'm not aware of any health data analytics problems that | need to scale over millions of consumer devices). | lacker wrote: | For "big data" type tasks, the cost of shuffling around the | data to thousands of different consumer devices probably makes | it ineffective. You're better off just paying for one | centralized data warehouse, rather than all the custom coding | and decentralized bandwidth to make some consumer-based | strategy work. This seems more like it could be useful for | building things like decentralized encryption protocols. | xsamgreen wrote: | FHE is undergoing standardization right now. The leading | algorithm candidates are based on lattice crypto which is | currently quantum-safe: https://homomorphicencryption.org/ | | Here are some mature and actively-developed HE libraries: | | https://github.com/homenc/HElib (By IBM, employed the inventor of | FHE) | | https://github.com/Microsoft/SEAL (Best tutorials) | | https://palisade-crypto.org/software-library (High-quality | codebase) | | https://github.com/tfhe/tfhe (Fast boolean logic, used by | Cingulata) | | Good introductory material is hard to come by, but I like the | first half of this talk: https://simons.berkeley.edu/talks/intro- | fhe-and-tfhe | hansdieter1337 wrote: | Reminds me of my work with CryptDB. A framework to run SQL | queries over encrypted data: https://github.com/CryptDB/cryptdb | | One important fact about homomorphic and other encryption schemes | you can calculate on: It leaks information! E.g., if enc(x) + | enc(y) = enc(z), you gain the knowledge that x + y = z. With | enough data, it's easy to obtain the unencrypted data without the | secret(s). | aruss wrote: | I did part of a PhD in this field; your comment about | information leakage is not true. FHE schemes are IND-CPA | secure, which means you can't test for equality in the | encrypted space (Note that in the standard definition of | homomorphic encryption, you're only guaranteed that | Dec(Enc(x+y)) = Dec(Enc(x) + Enc(y)) = x + y). Your comment | only applies to schemes where you can test for equality in the | encrypted space, which are not IND-CPA secure and should never | be used in practice (except in some extreme scenarios where you | can/need to make some strong assumptions). | gautamcgoel wrote: | I think you meant homomorphic, not homophobic... | hansdieter1337 wrote: | ah, hehe, yes indeed. Thank you, auto-correct | asdfaoeu wrote: | 47 upvotes, no comments. I'm guessing no one understands this. | jacobush wrote: | Yep. Was hoping for someone to debunk or explain. My (probably | deeply flawed) hunch was that one would need some kind of | specialised virtual machine or something to make homomorphic | stuff. I'm babbling now. | | Yes, someone please explain Cingulata. | qayxc wrote: | Cingulata is a system for secure, privacy preserving | applications based on homomorphic encryption. | | Homomorphic encryption is a form of cryptography that allows | for operations on encrypted content to yield the same results | (though encrypted) as if applied to plain text input. | | Basically, the framework takes algorithms implemented in C++ | into a virtual boolean circuit that operates on encrypted | data. This means you can run your database, AI training, page | ranking, etc. on encrypted data for a truly end-to-end | encrypted processing or computation on untrusted devices or | environments (cloud computing!). | | This comes at a price, though, as the virtual circuit is | basically a software interpreter for your original algorithm | and thus is abysmally slow compared to the original code... | | At least that's my understanding of the system. | radres wrote: | So I can give an executable to anyone in the world and they | cannot get to my source code? | aruss wrote: | Not quite. The data is encrypted, the computation is not. | Through the use of universal circuits (circuits that run | other circuits) you could give an encrypted executable to | an untrusted party to run on encrypted data, but note | that the result of this computation is encrypted as well | (so, the executable would be useless for the other party | -- only you would be able to use the results). In fact, | this idea underpins the crucial idea behind FHE: | bootstrapping. | layoutIfNeeded wrote: | No. | | You have a function F which maps inputs x to outputs y. | You transform this function into new one F', that will | map the encrypted inputs encrypt(x) to the encrypted | outputs encrypt(y), _without knowing how to decrypt_. | F(x) = y F'(encrypt(x)) = encrypt(y) | russfink wrote: | This is correct. Additionally, the entity performing the | computation does not get to learn the answer. This is an | important distinction between a technique such as this, | and software obfuscation. | marius_k wrote: | Are these operations encrypted also? Is this method more | efficient rather than returning a mapping of all possible | unencrypted inputs and outputs for a given operation. | | [edit] I just did the math. 1. mapping of adding two bytes | would make ~64kB list of values. 2. mapping of adding two | i32 would make ~73786976294838.2MB list of values (looks | like encryption would do a better job) | aruss wrote: | It's a requirement of these algorithms that they run in | polynomial time with respect to the size of the input. | They're large polynomials with big constants, but they're | dwarfed by the exponentials you're considering. | pimlottc wrote: | > Cingulata (pronounced "tchingulata") | | As an English speaker, this makes me more confused about the | pronunciation, not less. | pmelendez wrote: | As a non-native English speaker living in a English speaking | country, I appreciate the phonetic guidance of a fairly | uncommon word | tzs wrote: | Did that really provide any phonetic guidance? All it appears | to say is that it is just like "cingulata" except the "c" | should be pronounced "tch". | | As you note "cingulata" is fairly uncommon. I have no idea | how to pronounce that, and so their guidance still leaves me | in the dark about everything but the first letter. | | For phonetic guidance something using the International | Phonetic Alphabet seems like the way to go. I think that | would give something like this: tSiNGgy@'lat@, -at@ | | That's just a guess taking the IPA for cingulata from | Merrian-Webster (siNGgy@'lat@, -at@) and replacing the "s" | with the "tS" that the Macmillan dictionary gives for the | "tch" in "latch", which is the first word that came to mind | that has a "tch". | | (Also, I believe many publications use their own variants on | the IPA, so I'm not sure that what I gave is actually proper. | If MW and/or Macmillian have their own, the above could be | some weird bastardization with little meaning). | zimpenfish wrote: | Same. You can't just steal words[1] and change the | pronunciation! | | [1] https://en.wikipedia.org/wiki/Cingulata | asjw wrote: | Cingulata comes from the Latin words cingula or cingulum and | it should be pronounced with the soft C (like in Tchaikovsky) | | > History and Etymology for Cingulata New Latin, from Latin | cingulum, cingula girdle + New Latin -ata | | https://www.merriam- | webster.com/dictionary/Cingulata#:~:text... | | It's a common mistake English speakers do all the times | | The rule is quite simple: a C followed by the vocals e and i | is always pronounced tch in Latin. | | Latins used K for "hard C" like in "corn" and S for the s | sound like in "cent" | ReactiveJelly wrote: | If Tchaikovsky is Russian and the translation of his name | into written English is arbitrary and it's pronounced | "Chaikovsky", what is the "T" for? | asjw wrote: | That's also very simple: it's the transliteration of the | phonetic symbol _' tS_ which represents the soft C sound | as in cheese. | | That's what happens when a language steals a foreign word | (Latin in this case) and changes its pronunciation | | In Latin the way groups of letters are pronounced it's | (almost) unambigous, it's a phonetic alphabet itself | | If you change the pronunciation, that part is lost and | you have to rely on recollection instead of recognition | | It also means that if you don't already know a word you | can't be sure on how it is pronounced | otabdeveloper4 wrote: | The 't' is from French. That's how they code the sound in | the French language. | asjw wrote: | P.s. Tchaikovsky in Russian is Chaikovskii and it is | pronounced Cheekousky and not Chaikowsky | | The correct transliteration is Cajkovskij and the correct | phonetic one is tcI'kofskjIj | zimpenfish wrote: | > a C followed by the vocals e and i is always pronounced | tch in Latin. | | MW (from your link) says it's pronounced "siNGgy@'lat@" | which is definitely not "tchingulata", no matter what Latin | says. | asjw wrote: | Yes, in English | | It also says that bus is pronounced 'b@s, but it's a | contraction of French omnibus, that comes from Latin | omnibus, which is pronounced omnibus (both in French and | Latin), like goose, but with a much shorter oo sound. | | Anyway the point was is English language that stole words | from other languages and changed their pronunciation, not | the other way around. | gpderetta wrote: | C followed by i and e is soft (as in english ch) in | ecclesiatical latin and modern Italian, but my | understanding is that romans actually pronounced C as hard | C everywhere. I.e. Cicero called himself Kikero. | asjw wrote: | Correct, in ancient Rome C,K and Q had the same hard | sound | | We are talking about classic Latin, born around the IV | century b.c. | | But it changed a lot over the centuries influenced by the | Greeks, to the point that the Byzantin Roman empire chose | Medieval Greek as official language at the end of the VI | century and even before, their Latin was different from | the classic Roman one. I think they abandoned Latin as | official language in the third century to use Koine Greek | (sorry don't know the name in English) basically a Greek | dialect, the first common form of Greek, used among all | Hellenistic cities which eventually became modern Greek, | still in use nowadays. | | Most of the Latin found in science (like in the animal | taxonomy) comes from "modern" ecclesiastic Latin which | incorporated traits of vulgar and neighbors languages | (French, German and Italian), it simplified the alphabet | but reduced the symbols available so C was mad an | affricate consonant. | | In practice we have dealt with modern Latin or some form | of it for the past 15 centuries, longer than Rome | existed. | fabiospampinato wrote: | It's a slippery thing, how should the word "lead" be | pronounced? Did I mean lead as in leader or lead the metal? | In the English language words don't always have a unique | pronunciation associated with them. | entelechy wrote: | Great work! This allows to perform arbitrary computation on | untrusted devices. | | However last time I checked computation in this scheme is | ridiculously slow: on modern machines, cutting edge | implementation of FHE manage to get around 100 integer operations | per second. | | Never the less there have been some brave startups trying to | commercialise this technology: | | https://venturebeat.com/2020/02/18/enveil-raises-10-million-... | | Other interesting things build on top of FHE: | | sql database where data and queries are fully encrypted: | https://github.com/zerodb/zerodb | | fully encripted brainfuck vm: https://github.com/f-prime/arcanevm | rhindi wrote: | There are two main approaches to FHE: homomorphic boolean | circuits and homomorphic numerical processing. | | In the former (eg Cingulata), you convert a program into a | boolean circuit, and evaluate each gate homomorphically. While | this is general purpose, it also means you decompose functions | that could be done in one instruction into multiple binary | operations (so very slow). That's usually what people refer to | when they say FHE is slow. | | The other approach consists of operating directly on encrypted | integers or reals, and finding ways to do more complex | computations (like a square function) in one step. While this | is obviously much faster, it is also limited to whatever | operations is supported by the scheme. This is what people | refer to when they say FHE can only do certain things. | | For years, the tradeoff has basically been slow and general | purpose, or fast and limited. But there are new scheme being | worked on that will be published soon that enable to go way | beyond what's currently done, such as doing efficient deep | learning over encrypted data and other complex numerical | processing. | | Lots is coming out of labs and will be on the market within 2 | years! | [deleted] | entelechy wrote: | ohh interesting! Are there any opensource implementations of | homomorphic numerical processing? | | Were there any efforts of combining both approaches? | hedora wrote: | Note that most useful homomorphic numerical encryption | schemes are easily breakable. Once you have equality you | can usually de-anonymize user data. Many companies have | been burnt by this. | | With a less than operator and the ability to encrypt chosen | plaintext values, you can decrypt arbitrary messages in a | linear (in message size) number of steps. | | Arithmetic operations can often be used to build gadgets | that bootstrap comparison operations. For instance, with | addition and equality you can implement a comparison | operation for low-medium cardinality fields. | | The field is littered with negative results that are being | sold as secure, practical systems. Be careful when using | them on important data. | bondarchuk wrote: | > _and the ability to encrypt chosen plaintext values_ | | Isn't this a big assumption? The way I envision it is | | 1. client encrypts data with their key | | 2. server computes on data without decrypting and without | needing the key | | 3. client decrypts computation output with their key. | | Or is it always required at step 2 that the server also | has the key needed for encryption (but not decryption | obviously)? | rhindi wrote: | The server doesn't need the decryption key, ever. Thats | the whole point in fact. FHE is end to end encryption for | compute. However there is sometimes a public key used, | called an evaluation key. | littlestymaar wrote: | > Isn't this a big assumption? | | The standard resilience criteria for modern multi-purpose | encryption suppose that your scheme should be resistant | to adaptive chosen-cipher attack. Chosen plaintext is a | way weaker attack (the hierarchy being: known plaintext < | chosen plaintext < chosen cipher < adaptative chosen | cipher). | | It may be OK for some situations, but it requires to be | much more cautious than with regular crypto (which is | already error-prone...). | Ar-Curunir wrote: | FHE schemes don't allow comparing equality of plaintexts | without decryption | rhindi wrote: | All FHE schemes today add tiny random noise to the | ciphertext so that encrypting the same data twice give | different results. The noise is then kept to a nominal | level as you compute homomorphically using a special | operation called bootstrapping. Then when you decrypt, | you just ignore the noise and get your result. If you do | that well and dont let the noise grow too big, you get | very strong security. | | Fwiw, bootstrapping is actually what makes FHE slow, not | the actual addition/multiplication etc | lumost wrote: | Is there a good standards body to refer to for progress | on these alternate encryption methods? | rhindi wrote: | I suggest looking at TFHE and SEAL for starters | rhindi wrote: | Yes, turns out you can convert ciphertexts from one scheme | to another, so you can go back and forth between them | depending on what type of computation you are trying to do. | However the cost of transciphering is high, so in practice | it doesn't work well. But give it a few years and it'll | work! | bargle0 wrote: | HEAAN and HEmat are two libraries for numerical processing | that you can find on github. They're not perfect, and | require work to get in to shape for real distributed | computation. | Jyaif wrote: | Where did you get the "100 integer operations per second"? | | I thought the speed was in the order of minutes for a single | operation. | entelechy wrote: | Indeed I didn't remember it correctly: | | According to https://tfhe.github.io/tfhe/ states: | | > Each binary gate takes about 13 milliseconds single-core | time to evaluate, which improves [DM15] by a factor 53, and | the mux gate takes about 26 CPU-ms | | Addition of two bits can be implemented using 5 binary gates | (fulladder) Hence to add 2 32 bit numbers ~416ms => __2 | additions per second __ | | EDIT: Shame I cannot edit my original post ___________________________________________________________________ (page generated 2020-06-06 23:00 UTC)