[HN Gopher] Show HN: A Python tool for text-based AI training an... ___________________________________________________________________ Show HN: A Python tool for text-based AI training and generation using GPT-2 Author : minimaxir Score : 102 points Date : 2020-05-18 15:15 UTC (7 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | alphagrep12345 wrote: | Your API looks really clean but what's the difference between | this and just GPT-2 (or) HuggingFace's implementations? | minimaxir wrote: | I talk about deviations from previous approaches in the DESIGN | doc | (https://github.com/minimaxir/aitextgen/blob/master/DESIGN.md), | but to answer the difference between aitextgen and Huggingface | Transformers: | | Model I/O: aitextgen abstracts some of the boilerplate and | supports custom GPT-2 models and importing the old TensorFlow | models better. | | Training: _Completely_ different from Transformers. Different | file processing and encoding, training loop leverages pytorch- | lightning. | | Generation: Abstracts boilerplate, allowing addition of more | utility functions (e.g bolding when printing to console, allow | printing bulk text to file). Generation is admittingly not that | much different than Transformers, but future iterations will | increasingly diverge. | minimaxir wrote: | For fun, here's a little demo of aitextgen that you can run on | your own computer. | | First install aitextgen: pip3 install aitextgen | | Then you can download and generate from a custom Hacker News | GPT-2 model I made (only 30MB compared to 500MB from the 124M | GPT-2) using the CLI! aitextgen generate | --model minimaxir/hacker-news --n 20 --to_file False | | Want to create Show HN titles? You can do that. | aitextgen generate --model minimaxir/hacker-news --n 20 --to_file | False --prompt "Show HN:" | ideashower wrote: | Show HN: Numericcal - A simple, distributed, and fast backups | | Show HN: A simple, free and open source alternative to Turkish | potatoies | | Show HN: A boilerplate for mobile development | | Show HN: Simple UI Gao-Parser (for the Web) | | Show HN: A fast, fully-featured web application framework | | Show HN: I have a side project you want to sell in a startup? | | Show HN: S3CARP Is Down | | Show HN: Finding the right work with friends and family | | Show HN: I built a webapp to remind users to view your | photoshopped stripes | | Show HN: Send a hands-only gift reason to the Mark Zuckerberg & | Stay a lot. | | Show HN: A simple, high-performance, full-disk encryption | | Show HN: Peer-to-peer programming language | | Show HN: Browse and duplicate images in your app's phone | | Show HN: Waze - Send a face back end to the internet | | Show HN: A simple, minimal, real-time building app to control | your Mac. | | Show HN: Sheldonize - A collaborative group for startups | | Show HN: Gumroad - Make your web app faster | | Show HN: An easy way to track time using MD5? | | Show HN: A simple, fast, and elegant ORM/Lambda: progressive | web apps for Vim | | Show HN: A simple landing page I've been working on elsdst | Certy. Here is how I was within the last year | contravariant wrote: | >progressive web apps for Vim | | Well it knows how to get HN users attention all right. | simonw wrote: | I've been following minimaxir's work with GPT-2 for a while - | I've tried building things on | https://github.com/minimaxir/gpt-2-simple for example - and this | looks like a HUGE leap forward in terms of developer usability. | The old stuff was pretty good on that front, but this looks | absolutely amazing. Really exciting project. | jakearmitage wrote: | Does anyone know an efficient way to "embed" models like this? | I'm currently working in a Tamagotchi-style RPI toy and I use | GPT-2 to generate answers to the chat. I wrote a simple API that | returns from the server. If I could embed my model, it would save | me having to have a server. | Voloskaya wrote: | The size of the model you need to get good enough generation | with something like GPT-2 is going to be pretty impractical on | a raspberry pi. You might maybe be able to fit a 3-layer | distilled GPT-2 in RAM (not quite sure what the latest RPI have | in term of RAM, 4GB?), but the latency is going to be pretty | horrible (multiple seconds). | minimaxir wrote: | The hard part of embedding is that the smallest 124M GPT-2 | model itself is huge at 500MB, which would be unreasonable for | performance/storage on the user end (and quantization/tracing | can't save _that_ much space). | | Hence why I'm looking into smaller models, which has been | difficult, but releasing aitextgen was a necessary first step. | alphagrep12345 wrote: | What do you mean by embed the model? | cedyf wrote: | Nice trying this out | brendanfalk wrote: | Don't understand why this didn't get more hype. This is amazing. | Well done | minimaxir wrote: | AI text generation in general is an industry that's been | underhyped. Which is why I'm trying to help shape it. :) | harshalaxman wrote: | Very cool. Can I ask what your use case is, or if it's just for | fun? | dustincoates wrote: | I had to do something similar (not this library, but I wish I | had known about it) just last week. I'm building out a product | demo and I wanted to fill it with books. I didn't want to go | searching for out of print books, so I created fake authors, | book titles, descriptions, and reviews. The longer text was | sometimes great, and sometimes had to be redone but overall it | worked really well. | minimaxir wrote: | I intend to productionize text generation, and this is a | necessary intermediate step. (gpt-2-simple had too many issues | in this area so I needed to start from scratch) | jramz wrote: | That is cool, do you have a timeline set out for this? | minimaxir wrote: | I'll likely start by creating a web API service similar to | what I did for gpt-2-simple, except more efficient: | https://github.com/minimaxir/gpt-2-cloud-run | | The next step is architecting an infrastructure for | scalable generation; that depends on a few fixes for both | aitextgen and the base Transformers. No ETA. | neoncontrails wrote: | Huge fan of your gpt2-simple library, which I used to train a | satirical news generator in a Colab notebook: | https://colab.research.google.com/drive/1buF7Tju3DkZeL-EV4Ft... | | > Generates text faster than gpt-2-simple and with better memory | efficiency! (even from the 1.5B GPT-2 model!) | | This is exciting news. One of very few drawbacks of gpt2-simple | is the inability to fine-tune a model of more than ~355M | parameters. Do these memory management improvements make it | possible to fine-tune a larger one? | minimaxir wrote: | > Do these memory management improvements make it possible to | fine-tune a larger one? | | Unfortunately not yet; I need to implement gradient | checkpointing first. Memory-wise, the results for finetuning | 124M are promising (<8 GB VRAM when it used to take about 12 GB | VRAM with gpt-2-simple) | starskublue wrote: | Awesome work! Whenever people tell me they want to get started | with NLP I tell them to play around with your libraries as | they're the easiest way to immediately start doing cool things. ___________________________________________________________________ (page generated 2020-05-18 23:00 UTC)