[HN Gopher] Image unshredding using a TSP solver
       ___________________________________________________________________
        
       Image unshredding using a TSP solver
        
       Author : ingve
       Score  : 141 points
       Date   : 2021-07-02 16:16 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | einpoklum wrote:
       | The author's latest commit:                  @robinhouston  The
       | word "perfectly" was used too many times
       | 
       | :-P
        
       | magnimeelcul wrote:
       | Cool read! Though ordering the columns such that the cumulative
       | cobsecutive dissimilaritiea are minimized is not what you really
       | want. Think of a shredded image of a checkerboard: with this
       | objective you would not obtain the original but rather a version
       | where there are contigous black/white steps on the left and on
       | the right, with a discontinuity in the middle.
       | 
       | That is to say: by reducing the objective to this you are
       | assuming the image is continuous on the horizzontal axis, while
       | in image processing images are usually supposed to be only
       | piecewise continuous.
        
         | stavros wrote:
         | But there's no way to reconstruct a checkerboard image without
         | knowing what it represented originally.
        
           | magnimeelcul wrote:
           | Without any a-priori I guess that's true, i.e. the problem
           | they want to solve isn't guaranteed to have a solution
        
       | ivanpondal wrote:
       | I really liked the TSP approach when this came out. Turns out it
       | works pretty well on text too:
       | https://github.com/ivanpondal/text-unshredding
        
       | sllabres wrote:
       | This is quite similar to the early decryption of the analog
       | Nagravision TV encryption system
       | https://en.wikipedia.org/wiki/Nagravision worked. Did it myself
       | at the time, but on to slow hardware/to little optimized
       | software. The decryption took much to long to be usable, but it
       | was fun to see that it worked quite fine
        
       | WalterBright wrote:
       | If you're going to use a shredder:
       | 
       | 1. use a cross-cut shredder
       | 
       | 2. mix many different shred jobs in the same bag
       | 
       | 3. mix the contents of multiple bags
       | 
       | 4. deposit different bags in different trash runs
       | 
       | 5. just burn it
        
       | jhncls wrote:
       | Would this algorithm allow for an image version of the Burrows-
       | Wheeler transform? So, sorting image rows and columns using a
       | compressibility criterion.
       | 
       | [0]
       | https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transf...
        
       | hinkley wrote:
       | I don't suppose there's a way to process paper into 'pellets' fit
       | for recycling that make this whole process untenable.
       | 
       | Paper fiber length dictates a lot about its reuse. The bigger the
       | chunks the smaller the combinatorics for reversing the
       | 'algorithm'. Big paper mache blobs might fare better. From a
       | recycling standpoint a machine that tears the paper instead of
       | cutting it would do better, but the uniqueness of the edges
       | probably reduces the problem space.
       | 
       | If someone invents a fast paper pulping machine that can mount on
       | a small truck that doesn't require a commercial vehicle license,
       | I bet you could sell loads of them.
       | 
       | Whatever happened to that team who used microcavitation to blast
       | the ink off of paper for reuse? Probably enough solvents left
       | behind to read the text like invisible ink, but as a first pass
       | it might be good.
        
       | jdthedisciple wrote:
       | What would be an exemplary use case for this? Like, when do you
       | ever have images in front of you that are shuffled in this way?
        
         | clipradiowallet wrote:
         | Well, imagine someone dumpster diving at a
         | bank/clinic/insurance provider that doesn't use a document
         | destruction service. Probably plenty of PII/PHI could be
         | reconstructed once the shredded fragments were singulated and
         | scanned.
        
         | IncRnd wrote:
         | We don't shred digital images in RAM in that way, but we can
         | scan & reconstruct physical images that have been shredded or
         | torn.
         | 
         | You can paste together possible scenes to construct side-
         | scrolling game scenery in realtime.
         | 
         | By applying a few insights we can phase one image into a hole
         | that is in another image. This is similar to randomizing an
         | array by creating a second array with random values, then
         | sorting them in tandem, except we use the TSP with expected
         | weightings.
         | 
         | There are lots of use cases for this.
        
           | est31 wrote:
           | In the virtual RAM accessible to the process the image might
           | be continuous, but its physical representation could be
           | different, as the operating system might use different areas
           | from the physical RAM to expose it to the process. If you
           | dump a computer's RAM e.g. in a forensic setting, you might
           | attain such a state.
           | 
           | Same goes for file systems as well as the wear leveling layer
           | of SSDs.
        
             | IncRnd wrote:
             | exactly!
        
         | tylfin wrote:
         | I could imagine a civil or criminal case relying on shredded
         | documents as a source of evidence.
        
         | klyrs wrote:
         | When you dumpster dive your competitor looking for trade
         | secrets?
        
         | the8472 wrote:
         | In the 90s and 00s analog pay tv used to be "encrypted" like
         | this. IIRC software decryption relied on deriving the weak
         | shuffling key by manually reordering image rows and then using
         | that to reverse the key so the next frames could be decrypted
         | more reliably. Doing that kind of video processing in realtime
         | and in software was close to CPU limits.
         | 
         | https://en.wikipedia.org/wiki/Nagravision
        
       | aliasEli wrote:
       | This might even have an application for real documents. There are
       | police cases where some suspects have tried to destroy evidence.
       | Manually reconstructing the original takes a lot of effort. I
       | expect that the CIA and NSA already have software for it, but I
       | have never heard of it being used in a police case.
        
         | perihelions wrote:
         | Germany has run a program for decades to recover the Stasi's
         | paper documents, the bulk of which it shredded at the fall of
         | the GDR. People can sometimes get their own Stasi files, if
         | they're recovered.
         | 
         | https://www.dw.com/en/how-ai-could-solve-a-600-million-piece...
         | 
         | https://www.wired.com/2008/01/ff-stasi/
         | 
         | https://boingboing.net/2018/01/09/truth-reconciliation-and-t...
         | 
         | Surprisingly, there's never been an HN thread about this (as
         | far as I can tell).
         | 
         | From the _Wired_ link: _" Today, the study in Poppe's Berlin
         | apartment is lined floor to 12-foot ceiling with bookshelves
         | full of volumes on art, literature, and political science. But
         | one shelf, just to the left of her desk, is special. It holds a
         | pair of 3-inch-thick black binders -- copies of the most
         | important documents in Poppe's secret police files. This is her
         | Stasi shelf."_
        
           | odysseus wrote:
           | The painstaking process to put back together shredded Stasi
           | documents is mentioned at the end of The Lives of Others:
           | https://www.imdb.com/title/tt0405094/
           | 
           | Great movie, by the way.
        
             | einpoklum wrote:
             | I really didn't like that movie, focusing on people who are
             | wholly uninteresting politically; presenting a sort of a
             | "shocked and chagrined" attitude to people being spied on
             | by the government; and treating West Germany as some sort
             | of paradise.
             | 
             | I would recommend Ken Loach's Fatherland instead; but -
             | only until when Dritterman arrives in the UK (the end of
             | the film is kind of messed up).
        
               | CamperBob2 wrote:
               | The political class is more than capable of looking after
               | itself, thanks. It's the common people who have the most
               | to fear, and the movie does a great job at underscoring
               | that very point.
        
             | airstrike wrote:
             | _ _Fantastic_ _ movie, I just wanted to add.
        
         | ampdepolymerase wrote:
         | Current office shredders are almost all capable of double
         | shredding but considering how easy it is to reverse by OP's
         | project, it may be necessary to implement further measures for
         | document destruction.
         | 
         | A laser or enzymic based method could work but neither are
         | particularly office-friendly.
        
           | the8472 wrote:
           | https://a.pomf.cat/uayjgu.jpg
        
           | ChrisMarshallNY wrote:
           | I used to work for a defense contractor, in the mid-1980s.
           | 
           | Once a week, this big-ass tandem truck would park outside the
           | entrance, and the janitors would carry out boxes of papers,
           | under armed guard.
           | 
           | The truck had a huge shredder (crosscut), and also a big
           | kiln.
           | 
           | Once the paper was shredded, it was burned, and the ashes
           | were shredded, just to be sure.
           | 
           | Then, the truck would take the ashes away; never to be seen
           | again.
        
           | secondcoming wrote:
           | Could someone invent a shredder that pre-analyses an image of
           | what it's shredding and then shreds in a manner that makes
           | reconstruction infeasible due to the increased likelihood of
           | false positives? (e.g. the flipped and mirrored images in OP)
        
           | jhayward wrote:
           | OP's method relies on the permutation of elements in (columns
           | or rows) to be in the same order for each (column or row).
           | 
           | If you're just handed a pile of confetti you don't have that
           | same correspondence.
        
             | wongarsu wrote:
             | Also, you generally have written correspondence, not
             | images. I doubt that the similarity metric of OPs method
             | produces reasonable results on text, and a good similarity
             | metric for text would be much more complicated.
             | 
             | On the other hand physical shredders typically have other
             | weaknesses, for example if you don't stir the shredded
             | result you have a high correlation between original the
             | current and original relative position of the individual
             | pieces.
        
               | aliasEli wrote:
               | With written text, you would expect that at least some
               | the letters from one shred would continue on its
               | neighbors. I don't see why the similarity metric would
               | not work. Also normally you have a pretty high contrast
               | between the letter and the background, unlike the image
               | that was used in the example.
        
               | jhgb wrote:
               | If you have a language model, then sequences of
               | characters appearing in your text will add further
               | information to your edge matches.
        
           | [deleted]
        
         | jpalomaki wrote:
         | DARPA ran a competition on this problem around 10 years ago [1]
         | and page from Darpa archive [2]
         | 
         | [1] https://en.wikipedia.org/wiki/DARPA_Shredder_Challenge_2011
         | [2] https://archive.darpa.mil/shredderchallenge/index.html
        
       | sgk284 wrote:
       | This was a hiring puzzle from the Instagram team in 2012, from
       | back before Facebook acquired them: https://instagram-
       | engineering.com/instagram-engineering-chal...
       | 
       | I remember having a lot of fun solving it, in Coffeescript (which
       | was all the rage at the time), also with TSP but using the
       | Pearson correlation coefficient as the similarity metric:
       | https://gist.github.com/stevekrenzel/1505619
       | 
       | Here's a little demo of it in action:
       | http://krenzel.org/files/insta/
        
         | speedgoose wrote:
         | I like how unicorns can do such hiring challenges while normal
         | companies are desperate for people to apply. I think my current
         | company would hire anyone who can breath and write 3 lines of
         | code per month, but no one fitting these criteria applies.
        
           | onion2k wrote:
           | I can write 4 lines of code per month, so I guess I'm
           | overqualified.
        
             | speedgoose wrote:
             | Do they work ?
        
               | reificator wrote:
               | Oh, now we're moving the goalposts eh?
        
               | malux85 wrote:
               | On my machine, yes
        
               | kevin_thibedeau wrote:
               | They've got +3 on StackOverflow so fully peer reviewed.
        
           | satya71 wrote:
           | Are you hiring on-site or remote? Which country?
        
           | gccs wrote:
           | There is a shortage of engineers because companies don't want
           | to pay them what their worth.
        
             | hn_throwaway_99 wrote:
             | This gets spouted all the time on HN, and I find it rarely
             | to be true, or at least a gross oversimplification.
             | 
             | In my experience what usually happens is a programmer's
             | "worth" is hugely dependent on the scale of the
             | organization, moreso than nearly every other individual
             | contributor-level role, which means that smaller orgs have
             | an extremely difficult time competing with larger orgs.
             | 
             | For example, suppose at some company there is a business
             | process that generates $1 billion in annual revenue, and a
             | programmer has implemented something that can increase that
             | by, say, .2%. So the programmer's "worth" in that situation
             | is 2 million dollars.
             | 
             | A much smaller company only does $10 million in annual
             | revenue, so the commensurate .2% improvement is only worth
             | 20k.
             | 
             | This is a gross oversimplification, of course, but it
             | really gets to the heart of way the FAANGS can pay such
             | huge salaries and suck up a lot of the best talent. It's
             | not that all the other companies are being stingy, it's
             | just that in many cases they _can 't_ pay a programmer
             | anywhere near what a FAANG can because that programmer just
             | can't generate that much business value at a smaller
             | company.
        
             | WalterBright wrote:
             | The only reason there's a shortage of housing is because
             | people don't want to pay what housing is worth.
        
           | michaelcampbell wrote:
           | What's the work, and the pay? Those 2 things will play big
           | roles more than a dearth of candidates. Told my son for years
           | if you want to make more money, find something to do that no
           | else wants to. Turns out the same is on the hiring side.
        
             | speedgoose wrote:
             | Very average IT work in healt-care. Glorified CRUD plus
             | some. Pay is average too.
        
               | infinite8s wrote:
               | What does average mean in this case?
        
           | porphyra wrote:
           | I think public hiring challenges like this are less of a
           | barrier that filters out candidates, and more of an attractor
           | that increases the number of applicants by "nerd sniping"
           | people.
        
             | speedgoose wrote:
             | You need a strong brand and good marketing to attract
             | people with a challenge. A normal company has difficulties
             | to reach people.
        
         | ezekg wrote:
         | Reminds me how much I loved CoffeeScript. The code really is
         | beautiful.
        
       | nayuki wrote:
       | Discussions back in 2016:
       | https://news.ycombinator.com/item?id=12670997 ;
       | https://news.ycombinator.com/item?id=12667611
        
       | whazor wrote:
       | I used TSP to optimally stack different types of bowls. I
       | measured all the different pairs of bowls and searched for the
       | shortest route through all the bowls we have. I found that you
       | can split the stacks however you want, I also had an algorithm
       | for that.
        
         | Conlectus wrote:
         | Is the algorithm different from a simple sorted order in this
         | case? The difficulty with TSP is that you have to return to the
         | start, which wouldn't seek to be the case with stacled bowls,
         | unless you're stacking them in a ring somehow.
        
       | SavantIdiot wrote:
       | I wrote a program to do this two decades ago. I shredded a black
       | and white text document (long shreds, not the new shredders that
       | cut into confetti), scanned all the strips, front and back, and
       | created a lookup value for each edge based on the pixel position.
       | The biggest problem I had was the shredder blade removed too much
       | material. This caused a continuous stroke to be fatter on one
       | side than the other (imagine if the descender in a lowercase "p"
       | was on the blade's path).
        
         | nielsbot wrote:
         | I wonder if a high resolution photo of all the shreds
         | (flattened out) would also work and be easier?
        
         | hackcasual wrote:
         | DARPA had a challenge for this 10 years ago
         | https://gizmodo.com/darpas-almost-impossible-challenge-to-re...
        
           | SavantIdiot wrote:
           | That's awesome! I had no idea it existed. Too bad the top
           | level URL no longer exists, but now I'm eager to see if there
           | are solutions online today.
        
       ___________________________________________________________________
       (page generated 2021-07-02 23:00 UTC)