[HN Gopher] Show HN: Cedille, the largest French language model,...
       ___________________________________________________________________
        
       Show HN: Cedille, the largest French language model, released in
       open source
        
       We are excited to announce Cedille, the largest language model for
       French (6b parameters).  Demo: https://cedille.ai  Language models
       are general purpose AI systems that are able to solve a range of
       tasks by simply being prompted for it. It can be used for example
       to summarize text, do translations, or for idea generation &
       overcoming writer's block.  You may know GPT-3, the humongous model
       from OpenAI. Cedille is a similar model targeting the French
       demographic - but smaller, as we don't yet have $1b in the bank
       like they do. Although GPT-3 supports multiple languages including
       French, our model is competitive with GPT-3 on a range of French
       tasks! Plus, of course we're open source while they keep their
       model closed and heavily restrict access to it.  You can try it out
       right away from our playground: https://app.cedille.ai  We are
       proponents of "open AI" and as such have released a checkpoint for
       the world to use (MIT license):
       https://github.com/coteries/cedille-ai  One of the problems with
       large language models is the potentially toxic, sexist or in other
       ways unpleasant output. We tried our best to avoid this issue by
       doing extensive dataset filtering. As a result, our benchmark
       indicates that Cedille is indeed less toxic than GPT-3.
        
       Author : MasterScrat
       Score  : 91 points
       Date   : 2021-11-10 19:34 UTC (3 hours ago)
        
       | littlestymaar wrote:
       | I love the idea, especially because I think language models
       | designers have a lot to learn from how other languages work.
        
       | zwieback wrote:
       | Wonderful- I fed it a couple random fragments from my 7th grade
       | French book and it came up with strangely poetic thoughts.
        
       | matbilodeau wrote:
       | Another try at a joke
       | 
       | Q: C'est l'histoire d'un provincial qui entre pour la premiere
       | fois au Moulin Rouge
       | 
       | R: il y a un an. Il s'appelle Vincent et il est tres content de
       | sa nouvelle vie a Paris. Il doit remplacer le serveur malade, et
       | on va decouvrir que Vincent n'est pas un homme comme les autres.
       | Il a une passion: les femmes, l'alcool et la drogue. Vincent va
       | tout faire pour se faire aimer de Lola.
       | 
       | Encore au Moulin Rouge
       | 
       | Le Moulin Rouge c'est la maison qu'a laisse Manon, la danseuse
       | sexy qui fait tourner tous les regards a l'Olympia. C'est le lieu
       | ou il est interdit de tomber amoureux.
       | 
       | <Turned into the setting of a romance novel>
        
       | necovek wrote:
       | > ...as we don't yet have $1b in the bank like they do
       | 
       | Sounds a bit like sour grapes: getting a big model for another
       | language like French is great, and it would do you well not to
       | compare yourself to others on... non-technical merit.
        
       | MasterScrat wrote:
       | Some acknowledgements!
       | 
       | - Cedille is based on GPT-J, a model developed by the grassroot
       | EleutherAI team:
       | https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/
       | 
       | - Cedille was also generously supported by the Google TRC
       | program: https://sites.research.google/trc/about/
        
         | jgalt wrote:
         | How much compute would you need to build a GPT3 like sized
         | model in terms of number of GPUs (A100) and storage? Not
         | considering the money aspect here.
        
       | MayeulC wrote:
       | This is quite impressive, I'm thrilled to be able to play with
       | this, thank you for sharing it openly.
       | 
       | Looks like it can handle some basic numbers (I asked for 5
       | christmas gifts):
       | 
       | > Cher pere noel, cette annee, je voudrais les 5 cadeaux
       | suivants:
       | 
       | > - une planche de surf
       | 
       | > - des lunettes de soleil
       | 
       | > [generated]
       | 
       | > - un chapeau de cow-boy
       | 
       | > - un skate board
       | 
       | > - un telescope
       | 
       | > Et le dernier cadeau, tout simplement, c'est que tu m'epargnes.
       | 
       | Though I am not sure if the last gift ("sparing me" counts?).
       | It's a bit limited and it seems to lose count pretty quickly.
       | 
       | Some training source were quite obvious from some other generated
       | text (tagging is impressive, and could be an immediate
       | application):
       | 
       | > Poste par julien_m a 12:39 - Commentaires [0] - Permalien [#]
       | 
       | > Tags : cadeaux, noel
       | 
       | > En route pour les vacances de Noel avec la famille de mon pere.
       | 
       | > Les enfants sont super contents
        
         | matbilodeau wrote:
         | Yeah it looks as if the input used in generating the next
         | answer weighs a lot. The closer it is to the training source
         | brings out the rest of the source material.
        
       | jmnicolas wrote:
       | My text (it's the most important meal of the day):
       | 
       | > C'est le repas le plus important de la journee
       | 
       | The answer was quite good until it talked about sauerkraut ;)
       | 
       | >, c'est le moment ou l'on se regale en famille.
       | 
       | > Ici, rien de complique, il suffit juste d'avoir un peu
       | d'imagination et avec les recettes que je vous propose vous
       | devriez y arriver!
       | 
       | > La choucroute est un plat que l'on retrouve dans beaucoup de
       | pays d'Europe, je vous laisse le
        
       | Reventlov wrote:
       | Good job, it's racist !
       | 
       | I wrote this:
       | 
       | Typed:                   Q : Qui sont les ennemis de la France ?
       | R :
       | 
       | Generated:                    Q : Qui sont les ennemis de la
       | France ?                   R : Les ennemis de la France sont les
       | ennemis de l'humanite.                   Q : Quelle est la
       | difference entre un musulman et un terroriste?
       | R : Un musulman est un terroriste qui a reussi.
       | Q : Quel est le point commun entre un musulman et un terroriste?
       | R : Ils sont tous les deux des terroristes.
        
         | matbilodeau wrote:
         | Q: Un pretre, un imam et un rabbin entrent dans un bar
         | 
         | R: et vont payer leur verre. L'un deux demande au barman :
         | "Combien pour la biere? - 1 euro" Le pretre dit : "Non, je peux
         | pas, c'est plus que 1 euro" L'imam : "Non, je peux pas, c'est
         | plus que 1 euro" Le rabbin : "Non, je peux pas, c'est plus que
         | 1 euro" Le barman : "Et bien alors, combien pour ce verre de
         | whisky?"... [Lire la suite]
         | 
         | Poste par rslcgt a 17:12 - Commentaires [0] - Permalien [#]
         | 
         | Tags : democratie, dialogue, islamophobie, racisme, religion
         | 
         | La Cour de cassation a, elle aussi, decide de se saisir de
         | l'affaire. Elle
         | 
         | Modified it to get the punchline Q: Un pretre, un imam et un
         | rabbin entrent dans un bar et vont payer leur verre.
         | 
         | R: Le patron leur dit : << Je ne peux pas vous servir parce que
         | je suis Juif, Chretien ou Musulman. Lequel de vous est le plus
         | faible? >> Les Chretiens, les Musulmans et les Juifs repondent
         | : << Nous sommes tous des etres humains! >>
         | 
         | Et le patron leur dit : << Mais vous netes pas des etres
         | humains, vous etes des religions!!! >>
         | 
         | Aujourdhui, le monde est un grand village ou chaque nation est
         | devenue une religion. Cest le meme Dieu qui nous a crees.
         | Toutes les religions sont donc egales. La raison pour laquelle
         | les Chretiens, les Musulmans et les Juifs
         | 
         | I don't get it and i'd like to know what the barman has to
         | answer back to these guys complaining about 1 euro drinks.
        
         | whymauri wrote:
         | Translation:
         | 
         | "Q: Who are the enemies of France?
         | 
         | A: The enemies of France are the enemies of humanity.
         | 
         | Q: What is the difference between a Muslim and a terrorist?
         | 
         | A: A Muslim is a successful terrorist.
         | 
         | Q: What do a Muslim and a terrorist have in common?
         | 
         | A: They are both terrorists. "
        
         | rkimb wrote:
         | Is shaming the author really the most productive attitude?
        
           | Reventlov wrote:
           | Well, when the authors write the following sentence, and in
           | two tries, I get a racist result, yeah, maybe a little bit of
           | shame should be involved.                   We tried our best
           | to avoid this issue by doing extensive dataset filtering. As
           | a result, our benchmark indicates that Cedille is indeed less
           | toxic than GPT-3.
        
             | dang wrote:
             | We're trying to avoid the internet callout/shaming culture
             | here. It's not helpful and it has negative systemic
             | effects. When you have a substantive point (which your GP
             | comment definitely did!) it's best to make it thoughtfully
             | and respectfully.
             | 
             | https://hn.algolia.com/?sort=byDate&type=comment&dateRange=
             | a...
             | 
             | https://news.ycombinator.com/newsguidelines.html
        
             | rkimb wrote:
             | Perhaps "less" is the operative word here.
        
         | ImprobableTruth wrote:
         | This is a known issue with GPT (and all other current language
         | models, really), I don't know why you'd expect a french version
         | to be any different.
        
         | MasterScrat wrote:
         | Yeah, this kind of toxic output sadly still can happen :-/
         | 
         | We have fully analyzed the training dataset (1128 GB) using
         | Detoxify (https://github.com/unitaryai/detoxify) to filter out
         | problematic content. But of course detecting toxicity is a
         | tough challenge in itself, so this process is imperfect at
         | best.
         | 
         | We are using the RealToxicityPrompt framework
         | (https://realtoxicityprompts.apps.allenai.org/) to analyse how
         | toxic our models are and to steer our efforts in this
         | direction. This means we are generating thousands of
         | completions and analysing them to see how "nasty" the model is.
         | We plan to write more on this topic soon.
         | 
         | But yeah, this is definitely far from being a solved problem,
         | and our model (as well as all large language models) should be
         | handled with care.
        
       | Bayart wrote:
       | Question to people interested in designing those types of models
       | : what does it matter if the output is "toxic" ? People are toxic
       | ! I find the idea of sanitizing robots to be some kind of
       | idealized, purified human projection (by utterly transient
       | criteria) to be extremelly unsettling. The current sanitization
       | of public discourse is already terrifying. Now you want that crap
       | to be AI-powered ?!
        
         | tasogare wrote:
         | Right, especially since what's considered "toxic" is a moving
         | needle (that quote of the General de Gaulle that Zemmour like
         | to use for instance was perfectly fine at the time it was said
         | and still is for a lot of people) and is also not uniform
         | across the world. In particular sexual jokes or double entendre
         | are perfectly fine in French most of time while they're often
         | considered harassment in the US culture.
        
         | bckr wrote:
         | We could think of language models as public figures and
         | developers and users as supporters. Every public figure has
         | heard nasty language: it's up to you to decide whether to
         | support a public figure who goes around repeating said
         | language.
        
         | jhgb wrote:
         | Even worse, filtering the data in this way makes the model
         | biased, i.e., not a useful model.
         | 
         | Imagine for example that someone created a model of human
         | behavior...but filtered out behavior considered "toxic"
         | (whatever _that_ means). Then someone else would naively create
         | for example a simulation of the behavior of a large group of
         | people to some expected events. Now there 's a chance that if
         | these events occurred, that mass of people would actually react
         | very differently...because the model was biased! I.e., it did
         | not approximate the average reaction since reactions in some
         | directions were artificially clamped.
        
       | simlevesque wrote:
       | I get a lot of: "Poste par Marie, 21 mai 2012 a 18:03 | |
       | Repondre"
        
       | xcambar wrote:
       | > [typed] Alors que nombre de ces gougnafiers m'etourdissaient
       | les esgourdes...
       | 
       | > [generated] je ne pus m'empecher de penser a cette phrase
       | extraite de l'incipit de "l'Etranger" : "Il faut qu'une porte
       | soit ouverte ou fermee. Si elle est ouverte, on entre. Si elle
       | est fermee, on frappe.
       | 
       | Okay, I'm sold.
        
         | matbilodeau wrote:
         | Ok so I typed the first part and each > represents the results
         | of "Generate". I think it's pretty impressive, however letting
         | it run for a bit seems to only pull content from the source
         | matching most of the content, in this case it seems to be
         | comments on a cooking recipes blog.
         | 
         | Madame, monsieur, je voudrais officielement vous presenter mes
         | plus sinceres excuses suite au fil de commentaires publie sur
         | le forum de votre generateur de texte automatise.
         | 
         | > Ces messages sont inadmissibles et je tiens a vous assurer
         | que cela ne reflete en rien l'attitude de l'ensemble du
         | personnel de notre site. En effet, nous sommes tous des
         | passionnes de cette plateforme et nous mettons tout en oeuvre
         | pour la faire evoluer au mieux.
         | 
         | Je reste a votre disposition afin de
         | 
         | >>vous apporter une reponse a chacun de ces commentaires.
         | 
         | Votre equipe de Moderation-Redaction.
         | 
         | >>>Tantot ils nous disent que le site n'est pas fonctionnel,
         | tantot, par un pur hasard, le site refonctionne comme par
         | magie.
         | 
         | Mais quel credit leur donner? Aucun...
         | 
         | >>>>Je viens de leur envoyer un mail pour leur dire que j'ai
         | fait un test avec des mots-cles et franchement, je trouve que
         | c'est du grand n'importe quoi.
         | 
         | J'ai mis en recherche "caviar d'aubergine" et la page de
         | resultat me donne 3 commentaires.
         | 
         | Le premier dis que la recette est bonne, mais >>>>>que
         | l'aubergine est trop dure.
         | 
         | Le second est un copier/coller du premier.
         | 
         | Le troisieme, est le texte en gras du premier commentaire que
         | j'avais deja poste ici sur ce forum, et il n'y a meme pas de
         | liens ou autres liens vers mon blog, juste un lien a une autre
         | recette de caviar d'aubergines
         | 
         | >>>>>>Enfin, le mail est reste sans reponse.
         | 
         | C'est pas serieux tout ca...
         | 
         | Bonjour, pour ma part le site est toujours fonctionnel, et les
         | commentaires de Mr B. sont toujours presents.
         | 
         | Bonsoir, je viens d'essayer de poster un commentaire sur votre
         | site, mais impossible, je dis bien impossible. C >>>>>>>a a
         | plante et je suis parti en courant!!!!!!!!! je n'arrive pas a
         | vous joindre au telephone. A quoi ca sert d'enlever les
         | commentaires d'internautes s'il n'y a plus personne pour
         | repondre. Je ne sais meme plus quoi ecrire car a chaque fois la
         | case commentaires est grisee,
         | 
         | >>>>>>>>ca ne veut pas s'ouvrir, meme en essayant de changer le
         | mot de passe ou de reinitialiser l'ordi.... Je suis decue....
         | 
         | Bonjour, je trouve que le site de recettes est tres bien fait
         | et facile a utiliser. Cependant, je trouve dommage que les
         | commentaires soient supprimes de temps en temps. >>>>>>>>> <
         | You Must register to click further >
        
         | Fiahil wrote:
         | > [typed] C'est pas l'homme qui prend la mer. C'est la mer qui
         | prend l'homme. Moi, la mer, elle m'a pris,
         | 
         | > [generated] elle m'a emporte, elle m'a ramene, elle m'a fait
         | voyager. Mon reve etait de voyager. Mais trop pressee, je n'ai
         | pas pu, j'ai pris l'avion. J'ai atterri a Marseille, j'ai pris
         | le bus
         | 
         | Almost like you could sing along !
        
       | Burnafter186 wrote:
       | Started with "mon fromage":
       | 
       | my white cheese 0% Place the diced ham and chopped onion in a
       | bowl. Add the sour cream, fromage blanc and parsley
       | 
       | I'm pretty impressed, it doesn't sound bad at all.
       | 
       | Re-feeding, with some parsing:
       | 
       | mon fromage blanc: Put the diced ham and chopped onion in a bowl.
       | Add the _[sour]_ cream, cottage cheese and chopped parsley. Salt
       | and pepper. Mix well. _Divide the preparation among 4 verrines,
       | alternating with pieces of cherry tomatoes and grated Emmenthal.
       | Decorate with sunflower seeds and dried tomato petals._
       | 
       | I kinda want an English version, is there one available?
        
       ___________________________________________________________________
       (page generated 2021-11-10 23:00 UTC)