[HN Gopher] Compiler for the M language of the French DGFiP
       ___________________________________________________________________
        
       Compiler for the M language of the French DGFiP
        
       Author : testcross
       Score  : 178 points
       Date   : 2020-10-09 16:11 UTC (6 hours ago)
        
 (HTM) web link (gitlab.inria.fr)
 (TXT) w3m dump (gitlab.inria.fr)
        
       | boleary-gl wrote:
       | GitLab team member here
       | 
       | This is fantastic! If interested, you may want to check out our
       | program for Open Source users of GitLab:
       | https://about.gitlab.com/handbook/marketing/community-relati....
        
       | [deleted]
        
       | throwawaybutwhy wrote:
       | Confused about inclusion of floating-point arithmetic in tax
       | calculations [0]. Am I missing anything?
       | 
       | [0]
       | https://gitlab.inria.fr/verifisc/mlang/-/blob/master/formal_...
        
       | whitten wrote:
       | Calling your language the "M programming Language",
       | unfortunately, is ambiguous. One of the oldest langauges to claim
       | the name, is the MUMPS programming language used in healthcare
       | computing and in financial computing is also called M (
       | https://opensource.com/health/12/2/join-m-revolution ). The the
       | Power Query Formula Programming Language is informally called the
       | M programming language. Kiran S J of Bangalore has a language
       | named the M Programming language
       | (https://github.com/kiransj/m-programming-language) The Cache
       | programming language is a super set of the M programming language
       | ( https://cedocs.intersystems.com/latest/csp/docbook/Doc.View....
       | ). Microsoft has a modeling language called the M Programming
       | language (
       | http://community.bartdesmet.net/blogs/bart/archive/2009/02/1... )
       | There is the M# programming language (
       | https://en.wikipedia.org/wiki/M_Sharp ) and the M programming
       | language for language for HardWare description from
       | MentorGraphics and The language for Mathematica, was called M,
       | and now I think, is called the Wolfram Programming Language. (
       | http://wiki.c2.com/?ProgrammingLanguageNamingPatterns )
        
         | AshamedCaptain wrote:
         | Doesn't matter. From a quick glance, no one would be able to
         | distinguish MUMPS monstrosities from the monstrosities that are
         | the tax .m files. Maybe this M and MUMPS are even related.
        
       | glutamate wrote:
       | Really interesting to see this. I'm a co-founder of AdviceBridge,
       | where we have implemented part of the UK tax code related to
       | income tax and pensions in order to provide digital financial
       | advice.
       | 
       | In the UK, the HMRC (again, equivalent of IRS) makes worksheets
       | available for computational tax but these are not machine-
       | readable and are not guaranteed to be correct! (Indeed on some
       | points, government websites give incorrect information related to
       | the state pension. [1])
       | 
       | We did something similar to this approach but much simpler - we
       | wrote a little arithmetic language specifying the tax rules,
       | embedded in a spreadsheet for quick verification, and then
       | translated this language into C++ using a Haskell compiler.
       | 
       | [1]
       | https://www.thisismoney.co.uk/money/pensions/article-7100019...
        
         | Nextgrid wrote:
         | Does HMRC provide those free of charge? I'd be curious to have
         | a look at them.
        
           | glutamate wrote:
           | e.g. https://assets.publishing.service.gov.uk/government/uplo
           | ads/...
        
       | senstax wrote:
       | Related: does anyone know if, by using these languages (Coq and
       | OCaml), they've kept the door open to computer-assisted tax
       | sensitivity analysis? E.g., I'm interested in outputting some
       | kind of 2-dimensional or 3-dimensional solution space on 2, 3, or
       | 4 input variables to identify discontinuities and slopes. Any
       | thoughts?
       | 
       | ETA: Diving into my thoughts on this a little: really what I'm
       | describing would require (1) a dumb numerical analysis algorithm
       | or (2) some CAS computer algebra system features, my preference.
       | I don't know all the keywords and concepts, but I think term
       | rewriting and equation solving would get me towards the output I
       | seek: a multivariate, piecewise equation with user-selected input
       | variables and user-selected output variables: e.g., current year
       | tax, n+1 year tax, etc. Seems too involved, but ai have hope.
        
         | remexre wrote:
         | I think you'd want a differentiable language to make that easy.
        
       | justinclift wrote:
       | Something seems off with the current repo being pointed to. The
       | repo Readme says:                 This work is based on a retro-
       | engineering of the syntax       and the semantics of M, from the
       | codebase released by the       DGFiP.
       | 
       | Sounds like an external re-implementation, of the "original"
       | release here:
       | 
       | https://framagit.org/dgfip/ir-calcul
       | 
       | That original release says it's under a free license too.
       | 
       | Wonder why there's a re-implementation?
        
         | noname120 wrote:
         | The repo that you're linking to isn't an implementation of the
         | M compiler. Rather it's the rules/definitions that are used to
         | compute the income tax (<< Impot sur le Revenu >>).
         | 
         | The M compiler reimplementation linked in this submission
         | allows you to actually execute those rules and perform
         | simulations.
        
           | justinclift wrote:
           | Thanks, that's good info. :)
        
         | joelellis wrote:
         | The author explains in the twitter thread (french):
         | 
         | https://twitter.com/DMerigoux/status/1314531302079688709
         | 
         | > The difficulty arose from a constraint on the part of the
         | DGFiP which did not wish to publish, for security reasons, part
         | of the logic of the calculation corresponding to the "multiple
         | liquidations" mechanism. Raphael and I recreated this
         | unpublished part in a new DSL.
         | 
         | > The DGFiP also did not wish to publish its internal test
         | sets. We therefore proceed to the creation of a completely
         | random test set, from the unpublished content, in order to be
         | able to reproduce the validation of Mlang outside the DGFiP.
         | 
         | > A little less than a year after the publication of
         | https://blog.merigoux.ovh/en/2019/12/20/taxes-formal-
         | proofs...., we therefore found a compromise allowing to respect
         | both the 'source code publication obligation, and the security
         | constraints of the DGFiP.
         | 
         | > By allowing us to go to its operating site and confidentially
         | access the source code that it did not wish to publish, the
         | DGFiP has enabled us to find alternative solutions that make
         | the publication of the source code concrete and operational. .
        
         | [deleted]
        
       | kensai wrote:
       | What is the German equivalent?
        
       | cproctor wrote:
       | I have implemented parts of the tax code, following 1040 and the
       | network of forms it references line-by-line, for my own financial
       | planning. I've been selective about what I implement based on
       | what applies to me.
       | 
       | I don't share the code because I'm not sufficiently confident
       | that it's correct, don't want liability, and don't want an
       | obligation to keep it up to date.
       | 
       | That said, it feels like the scope of the project would be
       | manageable for a small nonprofit, and would be of great social
       | value. One reflection from my work is that it would be
       | particularly valuable to represent annual changes in the tax code
       | as transformations of the code AST.
        
         | chromatin wrote:
         | I understand your worry, but could you feel better offering it
         | under a free software license that expressly disclaims
         | warranties of usefulness for any purpose?
        
           | jeffrallen wrote:
           | Or leak it via Tor plus a pastebin, so that you are protected
           | from liability via anonymity?
        
       | fsflover wrote:
       | Very good news. Consider signing the petition to make all
       | publicly funded code free: https://publiccode.eu.
        
         | jmole wrote:
         | is there an initiative like this in the US?
        
           | aloukissas wrote:
           | Doubt it. The Intuit and HR Block lobbying dollars make sure
           | our tax system is complicated enough to keep their businesses
           | booming.
        
             | j4nt4b wrote:
             | Petitions are free, right? Unless it actually does exist,
             | it makes sense to start one.
        
               | fsflover wrote:
               | This petition is made by Free Software Foundation Europe.
               | I guess you should ask Free Software Foundation to start
               | one in the US.
        
               | j4nt4b wrote:
               | Just sent an email. I'll update when I learn more.
        
               | aloukissas wrote:
               | This is cute, but this isn't how things work in the US.
               | Money talks :)
        
               | j4nt4b wrote:
               | And though a tooth may bend the purest coin, remember too
               | how gold will knead reason and virtue themselves like
               | dough.
        
               | beardyw wrote:
               | I guess that's a quote. Where's it from?
        
               | j4nt4b wrote:
               | Me.
        
             | joshspankit wrote:
             | Don't forget the very lucrative business of "reducing tax
             | exposure" where small firms get millions by knowing the
             | smallest loopholes.
        
           | Ericson2314 wrote:
           | Code directly written be the Government must be public
           | domain. I'm not sure this is a step in the right direction,
           | or a step backwards as it just pushes more software
           | engineering to consultants.
           | 
           | (I've at least seen DARPA-funded work become open source.
           | That's a good step.)
        
             | cultus wrote:
             | The big gotcha with that is that most government projects
             | have collaborators in academia, NGOs, and/or industry. To a
             | big extent, the public domain mandate doesn't apply if
             | workers outside of the government contribute. Thus
             | manuscripts or code often aren't public domain.
             | 
             | edit: Also publishers _constantly_ "accidentally" claim
             | copyright on public domain works (I'm looking at you,
             | Elsevier). They never accidentally make something open-
             | access.
        
           | kazinator wrote:
           | In the US, you calculate your own income tax, don't you? Like
           | in Canada.
           | 
           | The "source code" to the calculations are the paper forms
           | which specify the calculations, making them transparent.
        
             | santraginean wrote:
             | People are downvoting without explaining, so I'll take a
             | shot.
             | 
             | First, yes, we calculate our own income tax, but the IRS
             | also calculates it separately, and if the two disagree,
             | they selectively decide whether to come knocking. (For
             | example, they once tried to bill me $300,000 for a tax year
             | when my net income was just over $100,000, because of
             | multiple clerical errors -- we had sold a house, and they
             | had both erroneously tried to apply capital gains tax where
             | it didn't apply at all, and tried to tax us for the full
             | sale price of the house as though we had bought it for $0
             | and flipped it. It took months of back and forth to fix the
             | issue, despite everything having been clearly documented.)
             | 
             | Second, calculating tax liability is just the first step.
             | Actually submitting tax returns electronically required a
             | third party for a long time, and I believe it still does
             | for all but the most straightforward returns. There's
             | absolutely no reason for that to be the case, other than
             | the outsize influence of lobbying money.
             | 
             | That's without getting into all the loopholes, dodges, and
             | hand-waving that make it possible for someone like Trump to
             | avoid paying taxes entirely most years, while those of us
             | who they know can't afford to lawyer up are the ones they
             | try to collect from.
        
               | kazinator wrote:
               | > _if the two disagree_
               | 
               | If the two disagree in a calculation matter, the
               | calculations spelled out in the tax form should be upheld
               | as the gold standard as to which side made the
               | calculation error.
               | 
               | > _they had both erroneously tried to apply capital gains
               | tax where it didn 't apply at all_
               | 
               | Was this issue due to paper tax forms doing a calculation
               | one way, but the IRS's implementation doing it another
               | way?
               | 
               | Most recently, I screwed up by claiming a credit that was
               | not allowed. However, the conditions for it are
               | documented; I was wrong. This is where code could help,
               | in order to clarify obtuse language in the requirements.
               | Obviously, the government runs code which checks the
               | conditions that determine whether a nonzero amount can be
               | claimed in some field, so it's just another calculation.
               | Still, it would be better for that to be crystal clear
               | pseudo-code and not the fragment of some actual
               | implementation.
        
           | reaperducer wrote:
           | _is there an initiative like this in the US?_
           | 
           | It depends, and varies greatly by location.
           | 
           | Anything created by the federal government is public domain
           | by law. However, not all federal agencies make their code
           | public. Some, understandably. Others, our of budget
           | constraints or ignorance. In theory, you could file a FOIA
           | request to get the code, assuming it's not classified.
           | 
           | Other levels of government can be problematic. In part,
           | because cities and towns can copyright things they create,
           | while the federal government cannot.
           | 
           | For example, the City of Chicago and some other cities have
           | data portals open to the public. Their utility varies.
           | 
           | Smaller cities, however, are less likely to understand the
           | important or value of making data public.
           | 
           | Back when governments started switching to data processing a
           | lot, I belonged to an organization called Investigative
           | Reporters and Editors. It had lots of guides for extracting
           | data from local governments. I remember lots of newspapers
           | rushing out to buy computers are nine-track tape readers so
           | they could sort through the information.
        
       | jeffrallen wrote:
       | C'est toujours rigolo de trouver des noms francais pour les
       | variables dans un program.
        
         | maelito wrote:
         | Bonne lecture ;)
         | 
         | https://github.com/betagouv/mon-entreprise/blob/master/mon-e...
        
         | jmnicolas wrote:
         | program __me __:o
        
         | lgvld wrote:
         | C'est toujours rigolo de trouver un commentaire ecrit en
         | francais sur HN. ;-)
        
       | [deleted]
        
       | bouzouk wrote:
       | There is also a similar and very good project funded by the
       | URSAAF and the DGFiP (the two main entities in the french tax
       | system) : https://publi.codes
       | 
       | IMO (I am not part of this project) it is more interesting as it
       | is language agnostic, easy to use for everyone (based on yaml)
       | and more importantly, it is starting to be implemented in the
       | government actual tax computing system.
       | 
       | We are using it in a challenger bank I started.
        
       | maelito wrote:
       | Related : the french administration has built a custom language
       | for another big set of tax rules, those that dictate our social
       | security system (which collects 500 billons of EUR / year)
       | 
       | It's presented on https://publi.codes but unfortunately we've not
       | translated it yet. The language keywords themselves are in
       | french, by design, to bridge the law and its official
       | implementation. It's not yet used to compute taxes, just to
       | simulate them on the official mon-entreprise.fr website.
       | 
       | The "code" expressed in YAML is parsed to build the computation
       | model (in TypeScript), to document this model on the Web (each
       | variable has a Web page) and to generate typeform-like forms.
       | 
       | It's in the https://github.com/betagouv/mon-entreprise monorepo,
       | but it's also used to implement a model of our personnal climate
       | impact, here : https://github.com/betagouv/ecolab-
       | data/tree/master/data
       | 
       | Et bravo Denis :)
       | 
       | Edit : in case you didn't know, the french adminstration's code
       | must by law be made public. This is just the beginning, expect
       | lots of similar projects ! You can browse some repos here
       | https://code.etalab.gouv.fr
        
         | littlestymaar wrote:
         | > the french adminstration's code must by law be made public
         | 
         | And this time, the code isn't printed on paper and send by post
         | mails, which is a neat progress ;).
         | 
         | See, https://www.nouvelobs.com/rue89/rue89-nos-vies-
         | connectees/20... (in French) on how "making code public" was
         | just a few years ago.
        
       | aunetx wrote:
       | That's funny because the INRIA is 5 minutes away from the place I
       | am studying sciences right now, I did not think I would ever see
       | them top in HN :)
        
         | hansjorg wrote:
         | They've been on the frontpage many times just the last year
         | (click inria.fr after the post title to see all submissions
         | linking there).
        
         | ccktlmazeltov wrote:
         | really? They're often on the frontpage.
        
           | agumonkey wrote:
           | unlike parent
        
         | coliveira wrote:
         | INRIA is very popular in the CS community, I wonder how you've
         | never heard of their work.
        
         | aaronblohowiak wrote:
         | OCaml gets a lot of love here from time to time..
        
           | aunetx wrote:
           | Actually I was not even aware that they created ocaml nor
           | scilab :/ so I seriously consider visiting this place again
           | in a near future
        
           | pjmlp wrote:
           | Alongside Smalltalk, INRIA is also a major contributor to
           | Pharo and Squeak, which descend both from the original
           | Smalltalk-80 image.
        
       | testcross wrote:
       | The author explains what were the challenging bits in this thread
       | (in french):
       | https://twitter.com/DMerigoux/status/1314531302079688709
        
         | agumonkey wrote:
         | and his blog https://blog.merigoux.ovh/en/2019/12/20/taxes-
         | formal-proofs....
        
         | sushshshsh wrote:
         | It's rare that I (an American) get to help out others by
         | translating something, so I will post my translation here.
         | 
         | "Four years after the first publication by DGFIP, I have the
         | pleasure of announcing that the source code permitting the
         | calculation of taxes on revenue is finally reusable
         | (recompilable by others)!
         | 
         | To use this algorithm in your application, follow this link...
         | 
         | It took us 1.5 years (with my coauthor Raphael Monat) to
         | identify that which was missing in the published code in order
         | for it to be reusable, and to fix this situation.
         | 
         | More or less, thanks to our project Mlang, a person can
         | simulate IR's calculations without needing to interface with
         | DGFIP.
         | 
         | The difficulty came from a constraint from DGFIP, who did not
         | want us to publish (for security reasons) a part of the code
         | that corresponds to a mechanism that handles "multiple
         | liquidations". Raphael and I recreated this unpublished part in
         | a new DSL.
         | 
         | DGFIP equally didnt want to publish their internal test games
         | (cases). We had proceeded therefore with the creation of a
         | suite of random test cases, separate from the non published
         | ones, to finally be able to reproduce the validation of Mlang
         | outside of DGFIP."
        
           | jcranmer wrote:
           | The last four posts in the Twitter thread:
           | 
           | "A little less than a year after the publication of [blog
           | post], we have therefore found a compromise letting us to
           | respect both the obligation to publish the source code and
           | the security constraints of DGFiP.
           | 
           | In letting us publish the code on their site and accessing
           | confidentially the source code they didn't want published,
           | the DGFiP let us find alternative solutions that made the
           | publication of the source code concrete and operational.
           | 
           | This compromise lets both parties come out on top, unlike
           | what happened with the source code of CNAF [link] where the
           | administration simply argued a too-important difficulty and
           | indefinitely postponed [1] it.
           | 
           | Letting those who ask for the source code to see it after a
           | NDA therefore appears to be a possible solution when the
           | publication is delicate for technical reasons. Could this
           | path be useful for the report of @ebothorel?"
           | 
           | [Note: translation here is somewhat more geared towards a
           | natural English translation than a literal French
           | translation.]
           | 
           | [1] "repouss[er] [...] aux calendes grecques" appears to be
           | an idiom that's not in my dictionaries, but from context
           | appears to mean "indefinitely postponed"
        
             | pygy_ wrote:
             | The _calendes_ were a Roman holiday IIRC. Greek ones simply
             | don't exist...
        
             | dakdak wrote:
             | The calends [0] are the first day of every month in the
             | Roman calendar. As the Ancient Greek calendar does not
             | feature calends, postponing something to the Greek calends
             | means postponing something to a later, unknown and unlikely
             | to happen date.
             | 
             | [0] https://en.wikipedia.org/wiki/Calends
        
           | emilecantin wrote:
           | Native French speaker here (Quebecois). Minor nitpick: Your
           | translation of "jeux de tests" to "test games" is incorrect.
           | 
           | The word "jeu" can indeed mean "game", but it can also mean a
           | group of things. A better translation would be "test suites",
           | "test sets" or similar.
        
             | sushshshsh wrote:
             | Thank you for your help! I did it totally on the fly
             | without a dictionary and I enjoyed learning this word from
             | you :)
        
             | ficklepickle wrote:
             | Bastille, tabernacle!
        
       | ftomassetti wrote:
       | You may be interested in what the Dutch Tax & Custom Agency is
       | doing: they built a DSL to express tax calculations. Here you can
       | find the case study:
       | https://resources.jetbrains.com/storage/products/mps/docs/MP...
       | 
       | Personally I work in the Language Engineering area and it seems
       | obvious that you want tax lawyers and accountants to interpret
       | the tax code and translate it into "code". Is just that you also
       | want "code" to be obvious for them and support by proper tooling,
       | which catch all inconsistencies.
       | 
       | I would also love to interview the author of this and the work
       | for mon-entreprise. While I understand French, I also have these
       | interviews in English to reach more people
        
       | dakdak wrote:
       | There's a much broader project that can compute most taxes and
       | benefits for France (and a few other countries) :
       | https://github.com/openfisca/openfisca-france
       | 
       | It would be quite interesting to check that both the French IRS
       | implementation of the tax & benefit laws and the free software
       | community (though most devs of the project were employed by the
       | French admnistration) implementation of the tax and benefit laws
       | actually output the same results.
        
       ___________________________________________________________________
       (page generated 2020-10-09 23:00 UTC)