[HN Gopher] Pix2tex: Using a ViT to convert images of equations ...
       ___________________________________________________________________
        
       Pix2tex: Using a ViT to convert images of equations into LaTeX code
        
       Author : Tomte
       Score  : 156 points
       Date   : 2023-11-03 10:17 UTC (12 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jksk61 wrote:
       | Nice, but what are the minimum requirements to run it locally?
        
         | nuz wrote:
         | ViTs are usually pretty cheap to run.
        
       | hospitalJail wrote:
       | I was thinking about how terrible of an idea it was for an ol
       | fortune 500 company of mine to put all their information and
       | lessons learned into some proprietary company infrastructure.
       | This might have been alright, but at the end of the day people
       | were uploading powerpoints. Heck it might even have been
       | reasonable at the time, but with LLMs, it seems like storing
       | everything in text/csv files would have been a much better idea.
       | 
       | The longer I live, the more I'm interested in saving all of my
       | data into text files that I can parse later without vendor lock-
       | in concern. Maybe other open formats as well, best tool for the
       | job, ya know.
        
         | mlyle wrote:
         | pptx isn't bad. Indeed, you have more structure available than
         | just text dumps.
         | 
         | It's just a zip file containing a bunch of XML. And the slides
         | XML isn't beautiful/super nice but not super ugly either.
         | Naively processing it is lossy, but not as lossy as converting
         | it to text.
         | 
         | And most images end up as png in them. The most annoying thing
         | is images with data (like equations).
        
       | GaggiX wrote:
       | This makes me wonder how well GPT-4V performs on this task (I
       | don't have access to it).
        
         | alright2565 wrote:
         | It is excellent. https://imgur.com/a/1YMSl9s
        
           | ilaksh wrote:
           | Makes me wonder what the SOTA is for open source efforts
           | along these lines.
           | 
           | I have heard about "mixture of experts" as being a
           | potentially important advance, and also of course about
           | multimodality. So I found this:
           | https://github.com/YeonwooSung/LIMoE-pytorch
        
           | GaggiX wrote:
           | Shouldn't the sum be done first and then the multiplication?
           | I think GPT-4V forgot to put the brackets around the sum.
        
             | alright2565 wrote:
             | Huh yeah it looks like it. I checked the python code when I
             | was using this & that is correctly parenthesized.
        
           | Silhouette wrote:
           | It is curious that the rendered equation (under "Sure, I can
           | help with that") appears to be incorrect due to some missing
           | parens but the Python implementation itself does appear to be
           | correct.
           | 
           | Now show us a version that takes into account actual
           | representations and errors, produces an optimal
           | implementation of the calculation for accuracy, and explains
           | why it is. :)
        
       | perihelions wrote:
       | For the morbidly curious, that nightmare math is someone's
       | quantum field theory notes which they typeset in TeX:
       | 
       | https://rohankulkarni.me/files/notes/heidelberg_qft/12_2.pdf ( _"
       | 12.2 Diagrammatic expansion of partition function for Yukawa
       | theory"_)
        
         | ttul wrote:
         | I recall dimly a period of months during engineering school
         | when I would have been able to parse those symbols and perhaps
         | make a joke about something in the lunch room. Those days are
         | long behind me.
        
         | mr_mitm wrote:
         | I was looking for "nightmare math" in the README and was
         | confused because I didn't find any. I guess that's what a
         | theoretical physics degree does to you: that formula looks very
         | harmless to me.
        
           | SiempreViernes wrote:
           | Yeah, I was also let down when I found the "nightmare math"
           | was simply integral of a generic Lagrangian density...
        
             | perihelions wrote:
             | _you 're_ a simple integral of generic density
        
               | black_puppydog wrote:
               | wow, please folks, keep it civilized! :D
        
               | rnk wrote:
               | Looks like a horror show to me. Makes me feel embarrassed
               | at leaving my math bs behind and going into cs. I have an
               | insane retirement idea of retiring to some fun mountain
               | town and going to grad school in physics. Where's the
               | best place to go skiing with a college that takes old
               | washed up programmers as students?
        
               | wolfi1 wrote:
               | https://en.wikipedia.org/wiki/%C3%89cole_de_physique_des_
               | Hou...
        
               | rnk wrote:
               | Good suggestion, but maybe shooting too high. Checks off
               | the "in the mountains" part of my fantasy life. But a
               | place that has seminars for grad students and working
               | physicists and has many Nobel laureates who attended as
               | students may be above my intellectual grade.
        
           | diracs_stache wrote:
           | Yeah that didn't look much more nasty than most problem sets
        
           | mistrial9 wrote:
           | three dimensions over time, with vector components?
        
         | itishappy wrote:
         | Quantum is a trip. Pages of math to describe... 4 straight
         | lines.
        
           | leumassuehtam wrote:
           | Quite the opposite, the 4 straight lines represent all that
           | math. Notation is very powerful and get you quite far.
        
             | itishappy wrote:
             | Understood, and that was exactly my (poorly made) point.
             | Crazy how powerful quantum notation is!
        
         | bluish29 wrote:
         | I would argue that it is a nightmare in general, a viewpoint of
         | someone who actually shared this pain before.
        
       | Palmik wrote:
       | If you're looking for more e2e math / latex aware OCR checkout
       | https://github.com/facebookresearch/nougat
        
       | bee_rider wrote:
       | Now we just need a way to convert a LaTeX equation to scipy or
       | Numpy or something like that.
        
         | adr1an wrote:
         | There was something like that, but it's been abandoned for a
         | while now... https://github.com/augustt198/latex2sympy
        
         | cypress66 wrote:
         | Probably gpt4
        
         | diracs_stache wrote:
         | even a rough go would be nice, I'll go back and check variables
         | and matrix operations, etc. but that step of going from a
         | derivation to engineering code is a slow step in my workflow
        
       | kuter wrote:
       | Took a peek at the models they use. It seems to be a vision
       | transformer encoder decoder architecture with a resent backbone.
       | Looks really good. I had a similar idea of training a model and
       | making a desktop application, but haven't had the opportunity. I
       | wonder how much compute it took to train the model.
       | 
       | I think this paper was the first one to do OCR on LaTeX:
       | http://cs231n.stanford.edu/reports/2017/pdfs/815.pdf The paper
       | describes an Encoder-Decoder architecture with CNN encoder and
       | LSTM based decoder.
        
         | srush wrote:
         | Want to give proper credit to my former student for starting
         | this: Yuntian Deng et al., 2016
         | (https://arxiv.org/abs/1609.04938). I believe this repo uses
         | the dataset from that paper.
         | 
         | Some recent cool work he's been doing:
         | https://www.youtube.com/watch?v=lx1XcTdhalU.
        
       | _venkatasg wrote:
       | Slightly related to the task, I wanna plug in my utility app for
       | finding LaTeX commands for characters, DeTeXt:
       | https://venkatasg.net/apps/detext
       | 
       | I've gotten a lot of requests to do whole equations, but I feel
       | that would massively increase the complication of the app for not
       | that much benefit? How often do people want to convert a whole
       | bunch of equations into LaTeX? My use case is usually writing my
       | own equations and forgetting the command for a specific symbol,
       | or looking for a symbol that looks something like X.
        
       | gammarator wrote:
       | A commercial product that does the same thing and has worked very
       | well in my experience is https://mathpix.com/. The free tier has
       | met my needs to date.
        
         | techwizrd wrote:
         | I've been using mathpix for several years, and it works really
         | well.
        
         | rnadomvirlabe wrote:
         | I use the paid version and I find it well worth the money to be
         | able to quickly compile multiple math sources into one LaTeX
         | document for reference. It's a huge time saver and works
         | surprisingly well, even on my handwritten notes.
        
       | spandextwins wrote:
       | Sheldon? Sheldon is that you??
        
       | bloopernova wrote:
       | I don't know anything about the workflow of scientists or
       | mathematicians. But I was wondering if equation recognition was
       | something that could help them? Like, is there utility in seeing
       | an equation on a whiteboard, importing it, and hooking up the
       | right inputs and outputs from that equation?
       | 
       | This is very much idle daydreaming. When you write out a big
       | equation on the wall, what happens then? Does it need to be
       | validated? Or does it go directly into a paper and no computation
       | is performed upon/with it?
        
         | bluish29 wrote:
         | For me at least, it is useful for different things, but they
         | are mainly about writing stuff. It is much easier to copy a
         | couple of equations from your references and use something like
         | image to latex to get the source without having to write them
         | yourself. Especially for complicated equations. It makes it
         | much faster to have discussions with other people online. It
         | makes writing notes, copying equations from textbook .etc. was
         | always stupidly time-consuming if you end up writing them in
         | latex.
         | 
         | Also, for a lot of people with addiction to think on blackboard
         | where you don't have to worry about anything else. It is easy.
         | Take a photo of what you did, erase the board, write new things
         | and take another photo. And when you are done and want to
         | preserve this in some notes or copy to paper, just use an
         | equation recognition tool and your life is much easier.
         | 
         | It is a productivity tool that saves and efforts, it will not
         | be going to make you a super researcher/scientist.
        
         | abdullahkhalids wrote:
         | Big equations don't come out of the ether. Either they are
         | derived some simpler set of underlying equations based on
         | assumptions, or they are taken from a paper/book that did that
         | derivation.
         | 
         | Usually, whoever does the derivation, or someone who wants to
         | understand things properly, will do computations on multiple
         | steps of the derivation from the start to the finish. A lot of
         | these computations can be done by hand - you don't need a
         | computer. A lot of computations should be done by hand - even
         | if they could be done by a computer - because you only get a
         | feel for the equations if you play with them with your hands.
         | To quote Dirac, 'I consider that I understand an equation when
         | I can predict the properties of its solutions, without actually
         | solving it.' That comes from solving a lot of them by hand.
         | 
         | Yes, oftentimes, doing numerical or symbolic computation with a
         | computer helps. But is the pain point of that having to type
         | the equation into the computer. Hardly. It would be nice, but
         | nothing ground breaking.
        
       | runxel wrote:
       | Now do it with hand-written formulas!
        
       | radarsat1 wrote:
       | Nice idea. This is one of those dream problems where you can just
       | synthesize a ton of data and solve the inverse problem. As a
       | student this is a great way to go for a project, but can be hard
       | to think up.
        
       | abdullahkhalids wrote:
       | I often teach online using a wacom+tablet + handwriting app. I
       | write a lot of equations. The slides are shared with students.
       | 
       | What would be really nice is, if I could feed slides.pdf to
       | something like this, and it did OCR on every handwritten text
       | (english or equation), and put the output as an invisible layer
       | under the text. Will make the slides searchable.
       | 
       | I understand though, OCR on handwritten equations, is a very
       | difficult problem.
        
         | Jaxan wrote:
         | I use OneNote for writing notes (and keep it in handwritten
         | form). Surprisingly, it is searchable! Doesn't work for
         | equations though.
        
       | facu17y wrote:
       | Repo has been deleted? I get a 404. I did see it earlier on.
       | 
       | I fed the equation image (screenshot at the right frame from
       | their gif then cropped) into ChatGPT (GPT4-V) and it correctly
       | deciphered the equation and gave the correct LaText code.
       | 
       | Why was the repo removed?
        
         | gwern wrote:
         | All of Github is down.
        
       ___________________________________________________________________
       (page generated 2023-11-03 23:00 UTC)