[HN Gopher] Pix2tex: Using a ViT to convert images of equations ... ___________________________________________________________________ Pix2tex: Using a ViT to convert images of equations into LaTeX code Author : Tomte Score : 156 points Date : 2023-11-03 10:17 UTC (12 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | jksk61 wrote: | Nice, but what are the minimum requirements to run it locally? | nuz wrote: | ViTs are usually pretty cheap to run. | hospitalJail wrote: | I was thinking about how terrible of an idea it was for an ol | fortune 500 company of mine to put all their information and | lessons learned into some proprietary company infrastructure. | This might have been alright, but at the end of the day people | were uploading powerpoints. Heck it might even have been | reasonable at the time, but with LLMs, it seems like storing | everything in text/csv files would have been a much better idea. | | The longer I live, the more I'm interested in saving all of my | data into text files that I can parse later without vendor lock- | in concern. Maybe other open formats as well, best tool for the | job, ya know. | mlyle wrote: | pptx isn't bad. Indeed, you have more structure available than | just text dumps. | | It's just a zip file containing a bunch of XML. And the slides | XML isn't beautiful/super nice but not super ugly either. | Naively processing it is lossy, but not as lossy as converting | it to text. | | And most images end up as png in them. The most annoying thing | is images with data (like equations). | GaggiX wrote: | This makes me wonder how well GPT-4V performs on this task (I | don't have access to it). | alright2565 wrote: | It is excellent. https://imgur.com/a/1YMSl9s | ilaksh wrote: | Makes me wonder what the SOTA is for open source efforts | along these lines. | | I have heard about "mixture of experts" as being a | potentially important advance, and also of course about | multimodality. So I found this: | https://github.com/YeonwooSung/LIMoE-pytorch | GaggiX wrote: | Shouldn't the sum be done first and then the multiplication? | I think GPT-4V forgot to put the brackets around the sum. | alright2565 wrote: | Huh yeah it looks like it. I checked the python code when I | was using this & that is correctly parenthesized. | Silhouette wrote: | It is curious that the rendered equation (under "Sure, I can | help with that") appears to be incorrect due to some missing | parens but the Python implementation itself does appear to be | correct. | | Now show us a version that takes into account actual | representations and errors, produces an optimal | implementation of the calculation for accuracy, and explains | why it is. :) | perihelions wrote: | For the morbidly curious, that nightmare math is someone's | quantum field theory notes which they typeset in TeX: | | https://rohankulkarni.me/files/notes/heidelberg_qft/12_2.pdf ( _" | 12.2 Diagrammatic expansion of partition function for Yukawa | theory"_) | ttul wrote: | I recall dimly a period of months during engineering school | when I would have been able to parse those symbols and perhaps | make a joke about something in the lunch room. Those days are | long behind me. | mr_mitm wrote: | I was looking for "nightmare math" in the README and was | confused because I didn't find any. I guess that's what a | theoretical physics degree does to you: that formula looks very | harmless to me. | SiempreViernes wrote: | Yeah, I was also let down when I found the "nightmare math" | was simply integral of a generic Lagrangian density... | perihelions wrote: | _you 're_ a simple integral of generic density | black_puppydog wrote: | wow, please folks, keep it civilized! :D | rnk wrote: | Looks like a horror show to me. Makes me feel embarrassed | at leaving my math bs behind and going into cs. I have an | insane retirement idea of retiring to some fun mountain | town and going to grad school in physics. Where's the | best place to go skiing with a college that takes old | washed up programmers as students? | wolfi1 wrote: | https://en.wikipedia.org/wiki/%C3%89cole_de_physique_des_ | Hou... | rnk wrote: | Good suggestion, but maybe shooting too high. Checks off | the "in the mountains" part of my fantasy life. But a | place that has seminars for grad students and working | physicists and has many Nobel laureates who attended as | students may be above my intellectual grade. | diracs_stache wrote: | Yeah that didn't look much more nasty than most problem sets | mistrial9 wrote: | three dimensions over time, with vector components? | itishappy wrote: | Quantum is a trip. Pages of math to describe... 4 straight | lines. | leumassuehtam wrote: | Quite the opposite, the 4 straight lines represent all that | math. Notation is very powerful and get you quite far. | itishappy wrote: | Understood, and that was exactly my (poorly made) point. | Crazy how powerful quantum notation is! | bluish29 wrote: | I would argue that it is a nightmare in general, a viewpoint of | someone who actually shared this pain before. | Palmik wrote: | If you're looking for more e2e math / latex aware OCR checkout | https://github.com/facebookresearch/nougat | bee_rider wrote: | Now we just need a way to convert a LaTeX equation to scipy or | Numpy or something like that. | adr1an wrote: | There was something like that, but it's been abandoned for a | while now... https://github.com/augustt198/latex2sympy | cypress66 wrote: | Probably gpt4 | diracs_stache wrote: | even a rough go would be nice, I'll go back and check variables | and matrix operations, etc. but that step of going from a | derivation to engineering code is a slow step in my workflow | kuter wrote: | Took a peek at the models they use. It seems to be a vision | transformer encoder decoder architecture with a resent backbone. | Looks really good. I had a similar idea of training a model and | making a desktop application, but haven't had the opportunity. I | wonder how much compute it took to train the model. | | I think this paper was the first one to do OCR on LaTeX: | http://cs231n.stanford.edu/reports/2017/pdfs/815.pdf The paper | describes an Encoder-Decoder architecture with CNN encoder and | LSTM based decoder. | srush wrote: | Want to give proper credit to my former student for starting | this: Yuntian Deng et al., 2016 | (https://arxiv.org/abs/1609.04938). I believe this repo uses | the dataset from that paper. | | Some recent cool work he's been doing: | https://www.youtube.com/watch?v=lx1XcTdhalU. | _venkatasg wrote: | Slightly related to the task, I wanna plug in my utility app for | finding LaTeX commands for characters, DeTeXt: | https://venkatasg.net/apps/detext | | I've gotten a lot of requests to do whole equations, but I feel | that would massively increase the complication of the app for not | that much benefit? How often do people want to convert a whole | bunch of equations into LaTeX? My use case is usually writing my | own equations and forgetting the command for a specific symbol, | or looking for a symbol that looks something like X. | gammarator wrote: | A commercial product that does the same thing and has worked very | well in my experience is https://mathpix.com/. The free tier has | met my needs to date. | techwizrd wrote: | I've been using mathpix for several years, and it works really | well. | rnadomvirlabe wrote: | I use the paid version and I find it well worth the money to be | able to quickly compile multiple math sources into one LaTeX | document for reference. It's a huge time saver and works | surprisingly well, even on my handwritten notes. | spandextwins wrote: | Sheldon? Sheldon is that you?? | bloopernova wrote: | I don't know anything about the workflow of scientists or | mathematicians. But I was wondering if equation recognition was | something that could help them? Like, is there utility in seeing | an equation on a whiteboard, importing it, and hooking up the | right inputs and outputs from that equation? | | This is very much idle daydreaming. When you write out a big | equation on the wall, what happens then? Does it need to be | validated? Or does it go directly into a paper and no computation | is performed upon/with it? | bluish29 wrote: | For me at least, it is useful for different things, but they | are mainly about writing stuff. It is much easier to copy a | couple of equations from your references and use something like | image to latex to get the source without having to write them | yourself. Especially for complicated equations. It makes it | much faster to have discussions with other people online. It | makes writing notes, copying equations from textbook .etc. was | always stupidly time-consuming if you end up writing them in | latex. | | Also, for a lot of people with addiction to think on blackboard | where you don't have to worry about anything else. It is easy. | Take a photo of what you did, erase the board, write new things | and take another photo. And when you are done and want to | preserve this in some notes or copy to paper, just use an | equation recognition tool and your life is much easier. | | It is a productivity tool that saves and efforts, it will not | be going to make you a super researcher/scientist. | abdullahkhalids wrote: | Big equations don't come out of the ether. Either they are | derived some simpler set of underlying equations based on | assumptions, or they are taken from a paper/book that did that | derivation. | | Usually, whoever does the derivation, or someone who wants to | understand things properly, will do computations on multiple | steps of the derivation from the start to the finish. A lot of | these computations can be done by hand - you don't need a | computer. A lot of computations should be done by hand - even | if they could be done by a computer - because you only get a | feel for the equations if you play with them with your hands. | To quote Dirac, 'I consider that I understand an equation when | I can predict the properties of its solutions, without actually | solving it.' That comes from solving a lot of them by hand. | | Yes, oftentimes, doing numerical or symbolic computation with a | computer helps. But is the pain point of that having to type | the equation into the computer. Hardly. It would be nice, but | nothing ground breaking. | runxel wrote: | Now do it with hand-written formulas! | radarsat1 wrote: | Nice idea. This is one of those dream problems where you can just | synthesize a ton of data and solve the inverse problem. As a | student this is a great way to go for a project, but can be hard | to think up. | abdullahkhalids wrote: | I often teach online using a wacom+tablet + handwriting app. I | write a lot of equations. The slides are shared with students. | | What would be really nice is, if I could feed slides.pdf to | something like this, and it did OCR on every handwritten text | (english or equation), and put the output as an invisible layer | under the text. Will make the slides searchable. | | I understand though, OCR on handwritten equations, is a very | difficult problem. | Jaxan wrote: | I use OneNote for writing notes (and keep it in handwritten | form). Surprisingly, it is searchable! Doesn't work for | equations though. | facu17y wrote: | Repo has been deleted? I get a 404. I did see it earlier on. | | I fed the equation image (screenshot at the right frame from | their gif then cropped) into ChatGPT (GPT4-V) and it correctly | deciphered the equation and gave the correct LaText code. | | Why was the repo removed? | gwern wrote: | All of Github is down. ___________________________________________________________________ (page generated 2023-11-03 23:00 UTC)