[HN Gopher] Image unshredding using a TSP solver ___________________________________________________________________ Image unshredding using a TSP solver Author : ingve Score : 141 points Date : 2021-07-02 16:16 UTC (6 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | einpoklum wrote: | The author's latest commit: @robinhouston The | word "perfectly" was used too many times | | :-P | magnimeelcul wrote: | Cool read! Though ordering the columns such that the cumulative | cobsecutive dissimilaritiea are minimized is not what you really | want. Think of a shredded image of a checkerboard: with this | objective you would not obtain the original but rather a version | where there are contigous black/white steps on the left and on | the right, with a discontinuity in the middle. | | That is to say: by reducing the objective to this you are | assuming the image is continuous on the horizzontal axis, while | in image processing images are usually supposed to be only | piecewise continuous. | stavros wrote: | But there's no way to reconstruct a checkerboard image without | knowing what it represented originally. | magnimeelcul wrote: | Without any a-priori I guess that's true, i.e. the problem | they want to solve isn't guaranteed to have a solution | ivanpondal wrote: | I really liked the TSP approach when this came out. Turns out it | works pretty well on text too: | https://github.com/ivanpondal/text-unshredding | sllabres wrote: | This is quite similar to the early decryption of the analog | Nagravision TV encryption system | https://en.wikipedia.org/wiki/Nagravision worked. Did it myself | at the time, but on to slow hardware/to little optimized | software. The decryption took much to long to be usable, but it | was fun to see that it worked quite fine | WalterBright wrote: | If you're going to use a shredder: | | 1. use a cross-cut shredder | | 2. mix many different shred jobs in the same bag | | 3. mix the contents of multiple bags | | 4. deposit different bags in different trash runs | | 5. just burn it | jhncls wrote: | Would this algorithm allow for an image version of the Burrows- | Wheeler transform? So, sorting image rows and columns using a | compressibility criterion. | | [0] | https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transf... | hinkley wrote: | I don't suppose there's a way to process paper into 'pellets' fit | for recycling that make this whole process untenable. | | Paper fiber length dictates a lot about its reuse. The bigger the | chunks the smaller the combinatorics for reversing the | 'algorithm'. Big paper mache blobs might fare better. From a | recycling standpoint a machine that tears the paper instead of | cutting it would do better, but the uniqueness of the edges | probably reduces the problem space. | | If someone invents a fast paper pulping machine that can mount on | a small truck that doesn't require a commercial vehicle license, | I bet you could sell loads of them. | | Whatever happened to that team who used microcavitation to blast | the ink off of paper for reuse? Probably enough solvents left | behind to read the text like invisible ink, but as a first pass | it might be good. | jdthedisciple wrote: | What would be an exemplary use case for this? Like, when do you | ever have images in front of you that are shuffled in this way? | clipradiowallet wrote: | Well, imagine someone dumpster diving at a | bank/clinic/insurance provider that doesn't use a document | destruction service. Probably plenty of PII/PHI could be | reconstructed once the shredded fragments were singulated and | scanned. | IncRnd wrote: | We don't shred digital images in RAM in that way, but we can | scan & reconstruct physical images that have been shredded or | torn. | | You can paste together possible scenes to construct side- | scrolling game scenery in realtime. | | By applying a few insights we can phase one image into a hole | that is in another image. This is similar to randomizing an | array by creating a second array with random values, then | sorting them in tandem, except we use the TSP with expected | weightings. | | There are lots of use cases for this. | est31 wrote: | In the virtual RAM accessible to the process the image might | be continuous, but its physical representation could be | different, as the operating system might use different areas | from the physical RAM to expose it to the process. If you | dump a computer's RAM e.g. in a forensic setting, you might | attain such a state. | | Same goes for file systems as well as the wear leveling layer | of SSDs. | IncRnd wrote: | exactly! | tylfin wrote: | I could imagine a civil or criminal case relying on shredded | documents as a source of evidence. | klyrs wrote: | When you dumpster dive your competitor looking for trade | secrets? | the8472 wrote: | In the 90s and 00s analog pay tv used to be "encrypted" like | this. IIRC software decryption relied on deriving the weak | shuffling key by manually reordering image rows and then using | that to reverse the key so the next frames could be decrypted | more reliably. Doing that kind of video processing in realtime | and in software was close to CPU limits. | | https://en.wikipedia.org/wiki/Nagravision | aliasEli wrote: | This might even have an application for real documents. There are | police cases where some suspects have tried to destroy evidence. | Manually reconstructing the original takes a lot of effort. I | expect that the CIA and NSA already have software for it, but I | have never heard of it being used in a police case. | perihelions wrote: | Germany has run a program for decades to recover the Stasi's | paper documents, the bulk of which it shredded at the fall of | the GDR. People can sometimes get their own Stasi files, if | they're recovered. | | https://www.dw.com/en/how-ai-could-solve-a-600-million-piece... | | https://www.wired.com/2008/01/ff-stasi/ | | https://boingboing.net/2018/01/09/truth-reconciliation-and-t... | | Surprisingly, there's never been an HN thread about this (as | far as I can tell). | | From the _Wired_ link: _" Today, the study in Poppe's Berlin | apartment is lined floor to 12-foot ceiling with bookshelves | full of volumes on art, literature, and political science. But | one shelf, just to the left of her desk, is special. It holds a | pair of 3-inch-thick black binders -- copies of the most | important documents in Poppe's secret police files. This is her | Stasi shelf."_ | odysseus wrote: | The painstaking process to put back together shredded Stasi | documents is mentioned at the end of The Lives of Others: | https://www.imdb.com/title/tt0405094/ | | Great movie, by the way. | einpoklum wrote: | I really didn't like that movie, focusing on people who are | wholly uninteresting politically; presenting a sort of a | "shocked and chagrined" attitude to people being spied on | by the government; and treating West Germany as some sort | of paradise. | | I would recommend Ken Loach's Fatherland instead; but - | only until when Dritterman arrives in the UK (the end of | the film is kind of messed up). | CamperBob2 wrote: | The political class is more than capable of looking after | itself, thanks. It's the common people who have the most | to fear, and the movie does a great job at underscoring | that very point. | airstrike wrote: | _ _Fantastic_ _ movie, I just wanted to add. | ampdepolymerase wrote: | Current office shredders are almost all capable of double | shredding but considering how easy it is to reverse by OP's | project, it may be necessary to implement further measures for | document destruction. | | A laser or enzymic based method could work but neither are | particularly office-friendly. | the8472 wrote: | https://a.pomf.cat/uayjgu.jpg | ChrisMarshallNY wrote: | I used to work for a defense contractor, in the mid-1980s. | | Once a week, this big-ass tandem truck would park outside the | entrance, and the janitors would carry out boxes of papers, | under armed guard. | | The truck had a huge shredder (crosscut), and also a big | kiln. | | Once the paper was shredded, it was burned, and the ashes | were shredded, just to be sure. | | Then, the truck would take the ashes away; never to be seen | again. | secondcoming wrote: | Could someone invent a shredder that pre-analyses an image of | what it's shredding and then shreds in a manner that makes | reconstruction infeasible due to the increased likelihood of | false positives? (e.g. the flipped and mirrored images in OP) | jhayward wrote: | OP's method relies on the permutation of elements in (columns | or rows) to be in the same order for each (column or row). | | If you're just handed a pile of confetti you don't have that | same correspondence. | wongarsu wrote: | Also, you generally have written correspondence, not | images. I doubt that the similarity metric of OPs method | produces reasonable results on text, and a good similarity | metric for text would be much more complicated. | | On the other hand physical shredders typically have other | weaknesses, for example if you don't stir the shredded | result you have a high correlation between original the | current and original relative position of the individual | pieces. | aliasEli wrote: | With written text, you would expect that at least some | the letters from one shred would continue on its | neighbors. I don't see why the similarity metric would | not work. Also normally you have a pretty high contrast | between the letter and the background, unlike the image | that was used in the example. | jhgb wrote: | If you have a language model, then sequences of | characters appearing in your text will add further | information to your edge matches. | [deleted] | jpalomaki wrote: | DARPA ran a competition on this problem around 10 years ago [1] | and page from Darpa archive [2] | | [1] https://en.wikipedia.org/wiki/DARPA_Shredder_Challenge_2011 | [2] https://archive.darpa.mil/shredderchallenge/index.html | sgk284 wrote: | This was a hiring puzzle from the Instagram team in 2012, from | back before Facebook acquired them: https://instagram- | engineering.com/instagram-engineering-chal... | | I remember having a lot of fun solving it, in Coffeescript (which | was all the rage at the time), also with TSP but using the | Pearson correlation coefficient as the similarity metric: | https://gist.github.com/stevekrenzel/1505619 | | Here's a little demo of it in action: | http://krenzel.org/files/insta/ | speedgoose wrote: | I like how unicorns can do such hiring challenges while normal | companies are desperate for people to apply. I think my current | company would hire anyone who can breath and write 3 lines of | code per month, but no one fitting these criteria applies. | onion2k wrote: | I can write 4 lines of code per month, so I guess I'm | overqualified. | speedgoose wrote: | Do they work ? | reificator wrote: | Oh, now we're moving the goalposts eh? | malux85 wrote: | On my machine, yes | kevin_thibedeau wrote: | They've got +3 on StackOverflow so fully peer reviewed. | satya71 wrote: | Are you hiring on-site or remote? Which country? | gccs wrote: | There is a shortage of engineers because companies don't want | to pay them what their worth. | hn_throwaway_99 wrote: | This gets spouted all the time on HN, and I find it rarely | to be true, or at least a gross oversimplification. | | In my experience what usually happens is a programmer's | "worth" is hugely dependent on the scale of the | organization, moreso than nearly every other individual | contributor-level role, which means that smaller orgs have | an extremely difficult time competing with larger orgs. | | For example, suppose at some company there is a business | process that generates $1 billion in annual revenue, and a | programmer has implemented something that can increase that | by, say, .2%. So the programmer's "worth" in that situation | is 2 million dollars. | | A much smaller company only does $10 million in annual | revenue, so the commensurate .2% improvement is only worth | 20k. | | This is a gross oversimplification, of course, but it | really gets to the heart of way the FAANGS can pay such | huge salaries and suck up a lot of the best talent. It's | not that all the other companies are being stingy, it's | just that in many cases they _can 't_ pay a programmer | anywhere near what a FAANG can because that programmer just | can't generate that much business value at a smaller | company. | WalterBright wrote: | The only reason there's a shortage of housing is because | people don't want to pay what housing is worth. | michaelcampbell wrote: | What's the work, and the pay? Those 2 things will play big | roles more than a dearth of candidates. Told my son for years | if you want to make more money, find something to do that no | else wants to. Turns out the same is on the hiring side. | speedgoose wrote: | Very average IT work in healt-care. Glorified CRUD plus | some. Pay is average too. | infinite8s wrote: | What does average mean in this case? | porphyra wrote: | I think public hiring challenges like this are less of a | barrier that filters out candidates, and more of an attractor | that increases the number of applicants by "nerd sniping" | people. | speedgoose wrote: | You need a strong brand and good marketing to attract | people with a challenge. A normal company has difficulties | to reach people. | ezekg wrote: | Reminds me how much I loved CoffeeScript. The code really is | beautiful. | nayuki wrote: | Discussions back in 2016: | https://news.ycombinator.com/item?id=12670997 ; | https://news.ycombinator.com/item?id=12667611 | whazor wrote: | I used TSP to optimally stack different types of bowls. I | measured all the different pairs of bowls and searched for the | shortest route through all the bowls we have. I found that you | can split the stacks however you want, I also had an algorithm | for that. | Conlectus wrote: | Is the algorithm different from a simple sorted order in this | case? The difficulty with TSP is that you have to return to the | start, which wouldn't seek to be the case with stacled bowls, | unless you're stacking them in a ring somehow. | SavantIdiot wrote: | I wrote a program to do this two decades ago. I shredded a black | and white text document (long shreds, not the new shredders that | cut into confetti), scanned all the strips, front and back, and | created a lookup value for each edge based on the pixel position. | The biggest problem I had was the shredder blade removed too much | material. This caused a continuous stroke to be fatter on one | side than the other (imagine if the descender in a lowercase "p" | was on the blade's path). | nielsbot wrote: | I wonder if a high resolution photo of all the shreds | (flattened out) would also work and be easier? | hackcasual wrote: | DARPA had a challenge for this 10 years ago | https://gizmodo.com/darpas-almost-impossible-challenge-to-re... | SavantIdiot wrote: | That's awesome! I had no idea it existed. Too bad the top | level URL no longer exists, but now I'm eager to see if there | are solutions online today. ___________________________________________________________________ (page generated 2021-07-02 23:00 UTC)