[HN Gopher] Show HN: A Python tool for text-based AI training an...
       ___________________________________________________________________
        
       Show HN: A Python tool for text-based AI training and generation
       using GPT-2
        
       Author : minimaxir
       Score  : 102 points
       Date   : 2020-05-18 15:15 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | alphagrep12345 wrote:
       | Your API looks really clean but what's the difference between
       | this and just GPT-2 (or) HuggingFace's implementations?
        
         | minimaxir wrote:
         | I talk about deviations from previous approaches in the DESIGN
         | doc
         | (https://github.com/minimaxir/aitextgen/blob/master/DESIGN.md),
         | but to answer the difference between aitextgen and Huggingface
         | Transformers:
         | 
         | Model I/O: aitextgen abstracts some of the boilerplate and
         | supports custom GPT-2 models and importing the old TensorFlow
         | models better.
         | 
         | Training: _Completely_ different from Transformers. Different
         | file processing and encoding, training loop leverages pytorch-
         | lightning.
         | 
         | Generation: Abstracts boilerplate, allowing addition of more
         | utility functions (e.g bolding when printing to console, allow
         | printing bulk text to file). Generation is admittingly not that
         | much different than Transformers, but future iterations will
         | increasingly diverge.
        
       | minimaxir wrote:
       | For fun, here's a little demo of aitextgen that you can run on
       | your own computer.
       | 
       | First install aitextgen:                   pip3 install aitextgen
       | 
       | Then you can download and generate from a custom Hacker News
       | GPT-2 model I made (only 30MB compared to 500MB from the 124M
       | GPT-2) using the CLI!                   aitextgen generate
       | --model minimaxir/hacker-news --n 20 --to_file False
       | 
       | Want to create Show HN titles? You can do that.
       | aitextgen generate --model minimaxir/hacker-news --n 20 --to_file
       | False --prompt "Show HN:"
        
         | ideashower wrote:
         | Show HN: Numericcal - A simple, distributed, and fast backups
         | 
         | Show HN: A simple, free and open source alternative to Turkish
         | potatoies
         | 
         | Show HN: A boilerplate for mobile development
         | 
         | Show HN: Simple UI Gao-Parser (for the Web)
         | 
         | Show HN: A fast, fully-featured web application framework
         | 
         | Show HN: I have a side project you want to sell in a startup?
         | 
         | Show HN: S3CARP Is Down
         | 
         | Show HN: Finding the right work with friends and family
         | 
         | Show HN: I built a webapp to remind users to view your
         | photoshopped stripes
         | 
         | Show HN: Send a hands-only gift reason to the Mark Zuckerberg &
         | Stay a lot.
         | 
         | Show HN: A simple, high-performance, full-disk encryption
         | 
         | Show HN: Peer-to-peer programming language
         | 
         | Show HN: Browse and duplicate images in your app's phone
         | 
         | Show HN: Waze - Send a face back end to the internet
         | 
         | Show HN: A simple, minimal, real-time building app to control
         | your Mac.
         | 
         | Show HN: Sheldonize - A collaborative group for startups
         | 
         | Show HN: Gumroad - Make your web app faster
         | 
         | Show HN: An easy way to track time using MD5?
         | 
         | Show HN: A simple, fast, and elegant ORM/Lambda: progressive
         | web apps for Vim
         | 
         | Show HN: A simple landing page I've been working on elsdst
         | Certy. Here is how I was within the last year
        
           | contravariant wrote:
           | >progressive web apps for Vim
           | 
           | Well it knows how to get HN users attention all right.
        
       | simonw wrote:
       | I've been following minimaxir's work with GPT-2 for a while -
       | I've tried building things on
       | https://github.com/minimaxir/gpt-2-simple for example - and this
       | looks like a HUGE leap forward in terms of developer usability.
       | The old stuff was pretty good on that front, but this looks
       | absolutely amazing. Really exciting project.
        
       | jakearmitage wrote:
       | Does anyone know an efficient way to "embed" models like this?
       | I'm currently working in a Tamagotchi-style RPI toy and I use
       | GPT-2 to generate answers to the chat. I wrote a simple API that
       | returns from the server. If I could embed my model, it would save
       | me having to have a server.
        
         | Voloskaya wrote:
         | The size of the model you need to get good enough generation
         | with something like GPT-2 is going to be pretty impractical on
         | a raspberry pi. You might maybe be able to fit a 3-layer
         | distilled GPT-2 in RAM (not quite sure what the latest RPI have
         | in term of RAM, 4GB?), but the latency is going to be pretty
         | horrible (multiple seconds).
        
         | minimaxir wrote:
         | The hard part of embedding is that the smallest 124M GPT-2
         | model itself is huge at 500MB, which would be unreasonable for
         | performance/storage on the user end (and quantization/tracing
         | can't save _that_ much space).
         | 
         | Hence why I'm looking into smaller models, which has been
         | difficult, but releasing aitextgen was a necessary first step.
        
         | alphagrep12345 wrote:
         | What do you mean by embed the model?
        
       | cedyf wrote:
       | Nice trying this out
        
       | brendanfalk wrote:
       | Don't understand why this didn't get more hype. This is amazing.
       | Well done
        
         | minimaxir wrote:
         | AI text generation in general is an industry that's been
         | underhyped. Which is why I'm trying to help shape it. :)
        
       | harshalaxman wrote:
       | Very cool. Can I ask what your use case is, or if it's just for
       | fun?
        
         | dustincoates wrote:
         | I had to do something similar (not this library, but I wish I
         | had known about it) just last week. I'm building out a product
         | demo and I wanted to fill it with books. I didn't want to go
         | searching for out of print books, so I created fake authors,
         | book titles, descriptions, and reviews. The longer text was
         | sometimes great, and sometimes had to be redone but overall it
         | worked really well.
        
         | minimaxir wrote:
         | I intend to productionize text generation, and this is a
         | necessary intermediate step. (gpt-2-simple had too many issues
         | in this area so I needed to start from scratch)
        
           | jramz wrote:
           | That is cool, do you have a timeline set out for this?
        
             | minimaxir wrote:
             | I'll likely start by creating a web API service similar to
             | what I did for gpt-2-simple, except more efficient:
             | https://github.com/minimaxir/gpt-2-cloud-run
             | 
             | The next step is architecting an infrastructure for
             | scalable generation; that depends on a few fixes for both
             | aitextgen and the base Transformers. No ETA.
        
       | neoncontrails wrote:
       | Huge fan of your gpt2-simple library, which I used to train a
       | satirical news generator in a Colab notebook:
       | https://colab.research.google.com/drive/1buF7Tju3DkZeL-EV4Ft...
       | 
       | > Generates text faster than gpt-2-simple and with better memory
       | efficiency! (even from the 1.5B GPT-2 model!)
       | 
       | This is exciting news. One of very few drawbacks of gpt2-simple
       | is the inability to fine-tune a model of more than ~355M
       | parameters. Do these memory management improvements make it
       | possible to fine-tune a larger one?
        
         | minimaxir wrote:
         | > Do these memory management improvements make it possible to
         | fine-tune a larger one?
         | 
         | Unfortunately not yet; I need to implement gradient
         | checkpointing first. Memory-wise, the results for finetuning
         | 124M are promising (<8 GB VRAM when it used to take about 12 GB
         | VRAM with gpt-2-simple)
        
       | starskublue wrote:
       | Awesome work! Whenever people tell me they want to get started
       | with NLP I tell them to play around with your libraries as
       | they're the easiest way to immediately start doing cool things.
        
       ___________________________________________________________________
       (page generated 2020-05-18 23:00 UTC)