[HN Gopher] Replibyte - Seed your database with real data ___________________________________________________________________ Replibyte - Seed your database with real data Author : evoxmusic Score : 96 points Date : 2022-07-10 18:39 UTC (4 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | CSSer wrote: | I think the description in the man entry is better than the one | in the README. Other than that, cool tool! | bennyp101 wrote: | How does it keep personal data safe? I had a look at "how it | works" and "faqs" but they don't answer how you keep stuff safe? | It also gets uploaded to S3? | | I might have missed it, but I need to know exactly where our PII | is stored (so not on a dev laptop), how do you know what to | replace and what do you do with any info you do replace? | | Edit: To answer my own question, via transformers. But that seems | to suggest each dev has to keep it up to date with any schema | changes etc | | (Also some links are broken on GitHub) | pistoriusp wrote: | You may want to check out Snaplet at https://docs.snaplet.dev. | I'm the co-founder, but we're not open-source (yet.) Our goal | is to give developers a database, and data, that they can code | against. | | We identify PII by introspecting your database, suggest fields | to transform, and provide a JavaScript runtime for writing | transformations. | | Besides transforming data, you can reduce, and generate data. | We are most excited about data-generation! | | The configuration lives in your repository, and you can capture | the snapshots in GitHub Actions. So you get "gitops workflow" | for data. | | A typical git-ops workflow: 1. Add a schema | migration for a new column. 2. Add a JS function to | generate new data for that column. 3. Add core to use the | new column. 4. Later, once you have data, use the same | function to transform the original value. (Or just keep | generating it.) | ev0xmusic wrote: | Hi, author of Replibyte here :) | | Yes, transformers is the way to go. I plan to add a way to | detect schema changes and at list not trying to create a dump | in case of change. I don't think it can be done in a safe way | without human admin check. | | (Thank you for your PR) | crummy wrote: | The user tells it what fields need replacing with the yaml | config. | dopidopHN wrote: | The default seems to be to store the sanitized dump on S3. | | It's not always available in a professional context. Or might be | considered extraction. | | Keeping everything local and detailing exactly what goes where | and how would be helpful. | Svarto wrote: | Also if it's possible to run everything without uploading it to | S3. For a smaller time dev with projects in production I would | find this really interesting for debugging the production | database data, but in development. Uploading it and having it | in S3 would needlessly complicate it for me (even though I can | understand enterprise customers might prefer it that way) | evoxmusic wrote: | You have a local storage option | https://www.replibyte.com/docs/datastores#local-disk | roskilli wrote: | One feature I'd love to see is a transformer that instead of | providing a random value provides a cryptographic one way hash of | the data (ie sha2) - that way key uniqueness stays the same (to | avoid unique constraints on columns) and also the same value used | in one place will match another value in another table after | transformation which more accurately reflects the "shape" of the | data. | pistoriusp wrote: | We do this via Copycat (https://github.com/snaplet/copycat). We | generate static "fake values" by hashing your original value to | a number, and map that to a fake-value. | MadsRC wrote: | This will not work, at least not if we're talking PII as it is | defined by a Somewhat Sane (TM) privacy legislation. | | Sure, passwords and credit card info is obscured with your | methodology, but names, dates of birth, sexual orientation, | telephone numbers, email and ip will remain unique. This | uniqueness is what allows you to potentially identify a person | given enough data. | MadsRC wrote: | I suppose that what you'd have to do is change the data and | then hash it. But once you've changed the data it's no longer | PII, so there's no reason to hash it. | | Of course, given enough data that has been changed can | potentially allow you to deduce how that data was changed and | thus revert it, at which point it would become PII again and | you'd have a problem... but that's probably a fringe scenario | tyingq wrote: | >Sure, passwords and credit card info is obscured with your | methodology | | Even that's problematic, because there may be code that | depends on the data being somewhat "real". Credit cards, for | example, may need to pass LUHN tests, or have valid BIN | sections, etc. | ev0xmusic wrote: | Hi, author of Replibyte here. Feel free to open an issue and | explain what is your use case. I will be happy to consider a | solution with the community. ___________________________________________________________________ (page generated 2022-07-10 23:00 UTC)