[HN Gopher] The drama in trying to convert election PDFs to Spre... ___________________________________________________________________ The drama in trying to convert election PDFs to Spreadsheets Author : markessien Score : 610 points Date : 2023-03-23 09:40 UTC (13 hours ago) (HTM) web link (markessien.com) (TXT) w3m dump (markessien.com) | djoldman wrote: | Checking one at random: | | https://docs.google.com/spreadsheets/d/1HhV9iJxXTU9liAZPIDoM... | | ...shows 0s in the first row for all candidate parties. But the | corresponding photo shows votes for all three: | | https://inec-cvr-cache.s3.eu-west-1.amazonaws.com/cached/res... | | I hope it's not a mistake and that there's some arcane | law/technicality to explain it. | | edit: another mistake on row 21, LP should get 25 but it was | credited to NNPP: | | https://docs.inecelectionresults.net/elections_prod/1292/sta... | dan-robertson wrote: | Yeah looks weird. When I scrolled to a random part, the numbers | seemed to line up. They didn't say things were entirely correct | though. Perhaps the data quality is sufficient for a challenge. | Odd that the first rows seem more wrong though. | neves wrote: | Is it true that USA does not have a open data law to make | everybody publish in CSV? | JumpCrisscross wrote: | > _Is it true that USA does not have an open data law to make | everybody publish in CSV?_ | | American elections are de-centralised. Each state comes up with | its methods. In some, each county. (I'm not sure how publishing | a CSV of vote totals would help.) | jxramos wrote: | > Then ominously, on the 20th of October of 2020 some people | drove there in unmarked cars and removed all the Cameras | installed at the tollgate. | | They at least capture some photos of the equipment. I wonder if | anyone communicated with the individuals. | OoTheNigerian wrote: | Nice read. It's important to note | | 1.The 2020 protesters did not begin vandalizing property, but | government infiltrated the protests by burning cars and maiming | people. | | 2. The Obidient movement encompassed multiple sub movements of | which a part of the #EndSARS was one of them. A vast majority of | Peter Obi's supporters were not #EndSARS activists. | | 3. Elections in Nigeria are fraught with treacherous behavior so | everyone suspects everything. It's important to be very careful | with your communication. There is a lot of desperation in the | land and so if in a position of information leverage, the | responsible thing is to handle the privilege with care and | transparency. | pxc wrote: | I'm impressed by the courage of the protesters here, and the | tenacity of the youth voters. | | I hope they get a clear answer and a fair count, and whether they | win this time or not, a real shot at cracking up their corrupt, | two-party system. | dec0dedab0de wrote: | This would have been a good use for hn style shadow banning. | Especially if they didn't publish the current tally, then the | original easy to detect bots may have never realized you were on | to them | kevviiinn wrote: | Wow what a cliffhanger, it sounds like they have to deal with the | courts now. I hope we get an update | | https://www.msn.com/en-us/news/world/opposition-files-petiti... | davedx wrote: | Incredible story. | | Some more background: | https://ng.usembassy.gov/nigerias-2023-elections/ | roschdal wrote: | The people who cast the votes don't decide an election, the | people who count the votes do. - Stalin. | pxc wrote: | In case the downright cartoonish character of this quotation | made anyone else wonder if it were fake... | | that quotation is, indeed, fake: | https://web.archive.org/web/20220128105324/https://www.polit... | mtrovo wrote: | Is the access to the original photos open? It might be fit for a | good Kaggle competition, although maybe a little too late for | this current election. | jasonjayr wrote: | From the article, it seems like the rush was to collect enough | evidence to file a challenge within the legal timeframe. With a | challenge filed, it seems like there is a bit more time to | verify claims + other evidence. (I know nothing of the system | of government there, but) -- it seems like the prudent thing to | do would be for the courts to mandate a neutral verification of | each of those paper sheets. (ie, 10 trusted representatives | from each party re-key the figures manually). | olabyne wrote: | If you want, you have exactly the same issue to solve with | Kenya last year. | | The pictures of all of the voting sites are available, but the | country went to chaos to pick a winner. It is crazy , because | on the lower level (in voting offices), the vote process was | respected and the numbers are trustworthy, but the higher you | go and the more corruption happens, as each aggregation of data | removes trust to the system. | mattlutze wrote: | This was thrilling. | | Sometimes, one person's bug is another person's feature :) | thread_id wrote: | Fantastic story. What an excellent example of democratization | from technology. And also a perfect example of how the blade cuts | both ways. Digital warriors battling it out in real time and the | stakes are enormous. Great respect for Mark and his ingenuity and | adaptive responses!!!! | tr33house wrote: | I'd tried something like this with the Kenyan election but our | setup was to use OCR (google cloud) -> text -> parse -> sqlite | | We started late so the results were out when we finished but I | think it'll be a good idea to develop software that can parse the | PDF results and display them faster than the electoral bodies | can. In Kenya, and Nigeria, the delays cause a lot of anxiety | YeGoblynQueenne wrote: | >> We had a brainstorming meeting, and decided to try a new | approach. We would simply ask the Obidients to help us do the | conversion. If hundreds of Obidients did the transcription, it | would go fast. | | What would guarantee that the Obidients would not, in turn, try | to inflate the score of the Labor candidate? | munchler wrote: | They planned to transcribe each PDF multiple times in order to | validate the results. | davedx wrote: | More background. OP is an impressive entrepreneur! Massive kudos. | https://markessien.com/projects/hotels-ng/ | dejongh wrote: | Wow. Wild story. Thanks for sharing. Cool twist that a bug ended | up identifying the bad guys. | hoseja wrote: | Silly, you don't malcount the actual votes, you brainwash the | population and pervert the process until they vote the way you | want them to, like in the advanced first world democracies. | avodonosov wrote: | That's not the worst case, if wise elite brainwashes | (manufactures consent of) the population. | | Worse is when the elite is not so wise (sometimes plainly | crazy), or the elite loses control to crazy people, | adversaries. Or self-induced mass hysteria of the population. | | The direct "democracy" that very soon will inevitably be | enabled by technology, poses great dangers in the situation | where masses are so easily manupulateable, and their collective | intelligence seems not raising above individual level, but | degrading below it for some reason. Violent chaos, lynch | courts, etc. | mmmuhd wrote: | Elupee 75, To be frank, you did a great job and i am proud of | someone from my country pulling this off, but the bitter truth is | President Elect Bola Ahmed Tinibu won this election. Peter Obi's | youth support is predominantly in the south, and Christian | majority parts of the country, he clearly lack support in the | Muslim north, where I am from. I voted for Kwankwaso though. | bmsleight_ wrote: | Can you expand on " he clearly lack support". Bonus points for | facts over opinions. | mmmuhd wrote: | Clearly means even his Vice Presidential Candidate could not | win his own polling unit, polling unit, not ward, not Local | Government, not State. | | https://punchng.com/nigeriaelections2023-datti-loses- | polling... | vuln wrote: | [flagged] | sd9 wrote: | I was naturally skeptical of the punchng article, so I | crosschecked it against OP's CSV. The votes in the article | do agree with OP's CSV (although the number of accredited | voters differs slightly). | | The crosschecked results are in KADUNA_crosschecked on line | 3800. The image is here: https://inec-cvr-cache.s3.eu- | west-1.amazonaws.com/cached/res... | | Accredited voters: 276, Registered voters: 750, APC: 98, | LP: 54, PDP: 102, NNPP: 11 | | All that said, I don't think that the results for 276 | voters in one polling unit in one ward in one local | government area in one state is clear evidence that Obi | lacks support. If anything, the fact that OP's CSV matches | a (potentially biased) news article gives me more faith in | OP's tallies and claims. | | (Aside: it seems _easier_ to lose an election in your own | polling unit, where variance plays a larger part, than it | is to lose on a wider scale.) | [deleted] | hardlianotion wrote: | That is a great job - well done from a grateful Nigerian. | bundie wrote: | I did not know that Nigerians used Hacker News :-D Most people | I encounter on this site are oyinbos. | hardlianotion wrote: | We are everywhere and cannot be avoided. | nivenkos wrote: | This is a great example of why electronic voting is important and | can help secure democracy. | cwkoss wrote: | Wouldn't electronic voting just create a means for the ruling | party to deliver the result without releasing evidence of vote | tampering? | | I don't understand what you think electronic voting solves... | logifail wrote: | > This is a great example of why electronic voting is important | and can help secure democracy. | | If those in power are against change, I wouldn't want to have | to put my trust in electronic voting if I was hoping for | change. | | I was left with the impression that it is the _paper_ records | in this story that led to the unravelling of an attempt to | forge the results. | | Long live paper ballots. | SkeuomorphicBee wrote: | > I was left with the impression that it is the paper records | in this story that led to the unravelling of an attempt to | forge the results. | | The manual tallying of paper records is what lead to the | attempt to forge the results in the first place. If the | results were electronically tallied to generate an official | result, then they wouldn't need to recount the whole election | to verify the result, just doing a statistically significant | random sampling of the polls to recount would be enough. | logifail wrote: | > If the results were electronically tallied to generate an | official result | | Electronic voting doesn't make bad politicians less bad. In | this instance, the bad guys were prepared to deliberately | remove CCTV so when they sent their goons out at night to | shoot protestors there would be no evidence. | | "Electronic tallies" are never going to give a free and | fair election if those in power are prepared to go that | far. Safer to stick with paper ballots and election | observers equipped with Mark I eyeballs. | pjc50 wrote: | How do you recount electronic-only elections? | SkeuomorphicBee wrote: | By looking at the receipts printed by the ballot | machines. | | Ballot machines print either a final tally at the end of | the day, or print every single vote and automatically | drop it into a physical ballot, depending on the threat | model of the country in question. Either way the you have | partial or total recount. | logifail wrote: | > By looking at the receipts printed by the ballot | machines. | | Let's the clear, you're not really "recounting" the | ballots at that point. If the machine is compromised - | and we're discussing a situation in which we know CCTV | was removed _and people were then shot_ - you have no | real idea if the receipt corresponds to the voter 's | original intent. Or, indeed, if all the receipts from all | the voters make it as far as the recount (?) | | > Ballot machines print either a final tally at the end | of the day, or print every single vote and automatically | drop it into a physical ballot, depending on the threat | model of the country in question. | | How is reprinting the final automated tally supposed to | represent a "recount" of the original automated tally? | | > Either way the you have partial or total recount. | | You really don't. Bits of paper and Mark I eyeballs all | the way. | | As Tom Scott puts it, "The key point is not is that paper | voting is perfect - it isn't - but attacks against it | don't scale well"[0]. | | [0] Why Electronic Voting Is Still A Bad Idea: | https://www.youtube.com/watch?v=LkH2r-sNjQs | SkeuomorphicBee wrote: | > How is reprinting the final automated tally supposed to | represent a "recount" of the original automated tally? | | If you want to detect tampering in the central totalling, | then all you need is the end of day receipt of each | ballot. Exactly like in OP's case. | | If you want to detect tampering in a ballot, then you | manually recount the individual printed paper votes | inside that ballot. That is something that you should do | to a random sample of ballots, plus ballots with unusual | totals. | | > As Tom Scott puts it, "The key point is not is that | paper voting is perfect - it isn't - but attacks against | it don't scale well"[0]. | | That is simply not true, large scale paper ballot | tampering scales very well to the point of turning | elections, and is much easier to pull off because it | happens in the fringe where no one is looking (while | tampering the electronic system would require pulling | your heist in the IT room where everyone is looking). | gdelfino01 wrote: | You introduce technology to increase transparency and fight | corruption. You increase transparency by having video | recordings of human counting votes linked to the electronic | record of the totals. | | When you introduce technology to eliminate manual counting and | paper trails, then transparency is eliminated and you give a | green light to fraud, corruption, very juicy contracts and | death. | TazeTSchnitzel wrote: | On the contrary, eletronic voting doesn't create the paper | trail necessary to dig up frauds like this. You can simply | program or hack the system to report any vote total you want. | SkeuomorphicBee wrote: | First of all, hacking the electronic system is much much | harder than hacking the paper process. In the case at hand | the paper tallying process was the one hacked. | | And second, electronic systems can create a paper trail, just | make the electronic machine spit out a paper receipt. Then | you have the best of both worlds, you can have instant | electronic totals, and then do some random sampling recounts | of the receipts to validate the result. | marcosdumay wrote: | Scaling an attack against paper is incredibly difficult, | and requires coordination in a level that is almost sure to | trigger the law enforcement much before it can change some | national-level numbers. | | Scaling an attack against a computer system is almost the | same as doing an attack against a computer system. Few | attacks don't scale. | | But yeah, if you just print the vote and push it into an | urn (while the voter can read it), you'll get the best of | both worlds. | redman25 wrote: | This might be a sensitive question but I wonder if something like | this would work in the United States? With all of the fears of | election interference why not trust but verify? | charles_f wrote: | Would you trust the recount? I mean, the only way to engage the | number of people you need to do that kind of recount is by | having them _very_ pissed, so most likely feeling like their | party was wronged and therefore the thing is partisan by | essence. If you 're on the winning party you wouldn't trust the | numbers the others give you anyhow, so what's the point | pjc50 wrote: | Genuinely the US would do better if it had paper elections with | a handcount with observers. The system works in the UK just | fine. Unfortunately, there's a category of people in both the | US and Nigeria who use "election interference" to mean | "accurately counting the votes". | pjc50 wrote: | Striking reminder of how big the world is that while I had heard | of #EndSARS, I hadn't realised the scale of the political | violence in Nigeria nor that it had its own Bloody Sunday-scale | massacre. | prhrb wrote: | What a scam by the ruling political party | SergeAx wrote: | Pdf is a very unfortunate format. It is proprietary, it is paper- | oriented, its almost single goal is to keep precise printing | layout. But for the last 30 years world didn't come up with | anything that could compete. | segfaultbuserr wrote: | PDF isn't the actual problem in this particular case. The | documents here are photographs taken at different camera | angles, embedded in PDFs. | jxramos wrote: | I was going to say, using alt drag to select vertical columns | is usually how I extract useable tables out from pdfs with | embedded tables. | londons_explore wrote: | Isn't things like this the reason that the UN provide election | observers? | | By spot checking just a random 100 votes are correctly tallied, | you can be pretty sure the outcome of the election is legit in a | > 10M voter country. | Someone wrote: | > By spot checking just a random 100 votes are correctly | tallied | | How do you do that? I think the only error you could detect is | when the tally has fewer votes for a party than what's in that | sample. If so, a fraudster could report 100 votes for every | party, and add the remaining to whatever party they want to | win. | londons_explore wrote: | You have to design the election system with this in mind. | | One such design would be for every vote to have a unique id. | When announcing the results, you also publish a list of which | vote ids were tallied for which candidate. | | Then you have 100 random ids, and the checkers watch those | votes all the way from the voter casting them to the final | tally. | jgtrosh wrote: | The context should be dated to 2020, not 2023 Edit: it was now | corrected, no need to downvote | | Great story! Looking forward to some follow up | public_defender wrote: | I don't understand. The article says the SARS protests started | in 2020 and the election was in 2023. This seems correct. | jgtrosh wrote: | Yes, it was now corrected | MontagFTB wrote: | So the bug where the first voting sheet shown to a user was from | the same 10% of the photos turned out to be a feature, serving as | a CAPTCHA of sorts to weed out the bad actors from the good. | | If memory serves, some CAPTCHA techniques include showing two | numbers to transcribe, where one's value is already known. If | that number is transcribed incorrectly, then the other number's | result isn't used, and the CAPTCHA fails. Perhaps a similar | technique may have also helped here? | Spare_account wrote: | This approach was part of their strategy: | | > _Then we started showing some results we knew to the bots - | if they entered wrong numbers, we would stop accepting the | results._ | didgetmaster wrote: | It seems to me that when combating bots or hackers, the wrong | approach is to provide immediate negative feedback. Giving an | immediate error code lets them know that their current | strategy is not working and to try something different. | | It seems like a better approach would be to make them think | you were accepting the results, when in fact they were going | to the bit bucket. Hackers trying to get into your corporate | database should be presented with a table full of false (but | plausible) data rather than an error. Let them waste time | trying to use all those fake SS numbers or account numbers | before they figure out they got duped. | theptip wrote: | For sure, shadow-banning is a great strat here. Raise their | costs, and don't give them any signal to learn from. | | Assuming you have the bandwidth to absorb the bot load, | which sounded like it was an issue here. | tetha wrote: | As scary as it can be, but yes. It's similar to strategy | games at a point - sometimes it's better to let the enemy | push you around for a bit as long as nothing important is | damaged. I don't really care if I have to scale up the LBs | a bit to handle all of the requests for some time. However, | this allows your attacker to commit more of their | resources, so you can block and ban more once you react or | so you can learn more about their behavior, so you can | mislead, slow-lorry and generally mess with them more | effectively. | | There have also been funny defcon-talks about messing with | attackers about this, by returning all kinds of messed up | return codes, slow-lorry'ing the bot, ... I'm kind of | wondering if you could SSRF (or rather, CSRF) a bot like | this by returning a redirect to e.g. the AWS metadata | API... could be a fun topic to mess with. | pbhjpbhj wrote: | It's also evidence of a crime. I wonder how that relates: | if you just drop those entries from the database (or from | the app prior to entry into the main db) then that seems | like destruction of evidence of a crime? | | It seems one should record all entries, but only update a | canonical db if all entries fail to trip automated | tampering detections. | malborodog wrote: | Can you explain that again differently? I didn't understand | that captcha point. It feels important though. | wodenokoto wrote: | Original captcha was built around transcribing text that ocr | tools failed at | | So I give you two words to transcribe to prove you are human. | I know one of them and I want to know the other. | czx4f4bd wrote: | I think they're referring to the old reCAPTCHA v1 approach. | | From https://en.wikipedia.org/wiki/ReCAPTCHA: | | > The original iteration of the service was a mass | collaboration platform designed for the digitization of | books, particularly those that were too illegible to be | scanned by computers. The verification prompts utilized pairs | of words from scanned pages, with one known word used as a | control for verification, and the second used to crowdsource | the reading of an uncertain word. | dan-robertson wrote: | I think the bug was that your first sheet came from a small set | and the people entering bad data would refresh instead of doing | the actually random next sheet, so entries for most of the | sheets came only from people who had long sessions who were | apparently more likely to enter good data. | churchill wrote: | Oh, and Mark didn't mention that Bola Ahmed Tinubu was indicted | for heroin charges in the US in 2003, forfeited $460k & is just | too old to run a democracy this size. | | Atiku Abubakar (second candidate) was a former VP and the | president he served under (Obasanjo) still insists the dude | remains a monument to corruption. | | There's been a coordinated campaign at all levels to rig this | election massively and we saw voter intimidation, manipulation in | broad daylight, and the acquiescence of foreign governments to it | all. | churchill wrote: | Proofs: | | To explain the $460k he forfeited to the feds for his heroin | trafficking indictment [0][1], Tinubu claims to have worked at | Deloitte as a consultant & made $850k in pre-tax bonuses a | year. Problem is, Deloitte claims he's never worked for them | [2] and a director at Deloitte earns $340k, according to | Glassdoor [3]. | | [0]: https://www.bbc.com/news/world-africa-61732548 [1]: | https://www.scribd.com/document/345742027/Bola-Tinubu-Heroin | [2]: https://pbs.twimg.com/media/FhhgxX2WQAAWOVo?format=jpg | [3]: https://www.glassdoor.com/Salary/Deloitte-Director- | Salaries-... | JumpCrisscross wrote: | > _a director at Deloitte earns $340k, according to | Glassdoor_ | | This in no way undermines your post, broadly. But narrowly, | these are sales roles. Two people with the same title at | Deloitte can make vastly different incomes depending on their | production. | themitigating wrote: | Proof? | charles_f wrote: | > run a democracy this size. | | From the looks of it, if he runs it, it won't be a democracy | bschne wrote: | > is just too old to run a democracy this size | | Ahem, somebody tell the U.S. that | lostlogin wrote: | > is just too old to run a democracy this size. | | Bola Ahmed Tinubu was born 29 March 1952. He is 70. | | Joe Biden was Born November 20, 1942. He is 80. | | There are plenty of world leaders that are old and I completely | agree with you. Why aren't there upper age limits? The UK House | of Lords, US Congress and US Supreme Court have this problem | too. | churchill wrote: | He claims to be 70 but it's been disputed widely - I don't | have the energy to filter signal from noise though. | churchill wrote: | I meant _heroin trafficking_ | mmmuhd wrote: | [flagged] | mrtksn wrote: | It's pretty easy to find articles about it on Bing Chat. | | https://businessday.ng/news/article/u-s-court-judgement- | indi... | | Also this appears to be the Indictment document: | https://www.scribd.com/document/580028043/Bola-Ahmed- | Tinubu-... | | Considering the needlessly passive aggressive tone, I would | assume you are a supporter. Maybe it can be more useful | conversation if you write your perspective on the matter | instead of demanding easy to find articles about the Bola | Ahmed Tinubu Heroin Trafficking Indictment? | churchill wrote: | Why not debunk everything I just wrote instead of attacking | me personally? | | Google is your friend and you can verify everything I said | about: | | Tinubu's drug trafficking indictment: | https://www.bbc.com/news/world-africa-61732548 | nimajneb wrote: | [dead] | klooney wrote: | Also, this is ridiculous | | > he became an "instant millionaire" while working as an | auditor at Deloitte and Touche. | churchill wrote: | Deloitte denies having a record of ever employing him, | like you can see here [0]. | | [0]: https://pbs.twimg.com/media/FhhvN- | fXEAAOTOK?format=jpg&name=... | | Tinubu claimed to be making $850k in annual pre-tax | bonuses working for Deloitte. Today, Directors at | Deloitte make 340k total comp annually, according to | Glassdoor, and that's before you factor in inflation. | What type of joke is this? | mmmuhd wrote: | churchill I am not attacking you, I am just drawing your | attention to bring solid evidence. the link you provided, | I couldn't find where the article states that Tinibu is | accused of Drug trafficking or Shettima Terrorism. | smcl wrote: | From the linked article: | | > While the court confirmed it had cause to believe the | money in the bank accounts were the proceeds of drug | trafficking | natpalmer1776 wrote: | Disclaimer: Not my monkey, not my circus. | | That being said, your comment came off as needlessly | aggressive to someone who knows nothing of these people | or politics. | favaq wrote: | [flagged] | rqtwteye wrote: | I still don't understand how we ended up with PDF as sort of | standard to archive data. PDF is already pretty bad for things | like manuals but for things like spreadsheets we basically | collect the data, then we destroy all the structure by putting it | in into POF, and later on we painstakingly try to restore the | data from PDF which is often almost impossible to do with | accuracy. | | It just shows that bad solutions often win. | andrewio wrote: | Try https://parsio.io. | | It converts PDFs into a structured JSON format that you can | export anywhere using a Zapier or Make automation: | manv1 wrote: | Back in the day there were at least two programs competing for | the role that PDF fills today that I remember: diskpaper and | PDF. Apple also had one for its developer docs, but it was | never released commercially, I believe. | | PDF provided more fidelity for printing, had better tooling (it | was by Adobe after all), it was cross-platform, could be | displayed on the desktop, so it won. The reader was cross- | platform so end-users didn't have to mess with installing | plugins for various image types. And because everyone in the | document creation division(1) used Postscript to print, | printing to PDF was super-easy. And at some point everyone had | a postscript printer driver on their machine, so printing to | PDF because super-easy as well. | | It's not an archiving tool, but people use it for | archiving...just like the way a spreadsheet isn't a project | management tool, but millions of people use it for project | management. | | At this point the network effects for the PDF file format would | make it difficult to replace. With PDF you can practically | guarantee(2) that the file will look the same on any device. | | (1) This was more true back then than today, probably (2) | assuming that you embedded the fonts, and that the reader | doesn't suck. | | What's funny is I don't think Adobe really makes any money off | of PDF; it's an accidental de-facto standard. | lostlogin wrote: | > PDF provided more fidelity for printing, had better tooling | | This might have been true once, but using Acrobat now is so | painful. Of all the apps that work, Apples Preview is my | editor of choice and when I'm on Windows I really miss it. | layer8 wrote: | > how we ended up with PDF as sort of standard to archive data. | | I don't think we really did. They are a standard for archiving | typeset page-based documents. | | Of course, paper documents used to be standard for archiving | data, and some continue to do so in the form of PDF. | | In principle, it is possible to integrate all the structure you | want in a PDF (using Marked Content, Structure Attributes and | User Properties), but for data (as opposed to document | structure) you'd need custom software to generate and interpret | that. | varenc wrote: | For this particular case, the use of PDFs seems irrelevant. | Photos were just taken of each polling unit's results. These | photos happened to then be embedded into PDFs for distribution, | but the core underlying data is just an image embedded into | that PDF. No important data was destroyed when these photos | were placed into PDFs. | spacebanana7 wrote: | I've thought about this and come round to think that the flaws | of PDF are actually essential to the success of the document | format. | | - Non-responsive (compared to HTML). Allows PDFs to serve as a | common standard between other document formats with different | resizing logic, like Latex and Word. | | - Difficultly of network access from code running inside | document. Allows PDFs to generally operate offline. Nobody's | brave enough to try to write a single page application in a PDF | | - Destroying data structure. Allows forward compatibility with | anything that can be displayed statically on a screen. New | applications can have different ideas about how tables, text or | charts should work but if there's static visual output then | it'll convert to PDF. Awareness of say, the structure of tables | is precisely what makes it so difficult for say google sheets | and excel to stay compatible with each other's new table | features. If somebody develops a new language with new | characters not even in Unicode it'll still work on a PDF | | It's also worth noting that most PDF limitations have the | characteristic of making things hard but not absolutely | impossible. These escape hatches prevent people with hard | requirements from actually moving to a new format. | | If it were truly impossible to get invoice data from PDFs | people might've shifted to a different format for business | transactions. But if it's merely difficult some company will | come up with an API that works as a good enough extraction | solution whose cost is justified by the other compatibility | benefits of PDFs, so the ecosystem stays with PDFs. | zo1 wrote: | Oh but there is: | | https://en.wikipedia.org/wiki/Apache_Flex | | Not sure if I linked to the right article, but it was | basically compiled scripts/code that was embedded into PDF's | that could run arbitrary code. | | ""Apache Flex, formerly Adobe Flex, is a software development | kit (SDK) for the development and deployment of cross- | platform rich web applications based on the Adobe Flash | platform."" | salawat wrote: | >Difficultly of network access from code running inside | document. Allows PDFs to generally operate offline. Nobody's | brave enough to try to write a single page application in a | PDF. | | You can absolutely do so. Most times however, the desire is | to embed the latest cut of info into the PDF, then hand it | off to somebody who will not have network access. | | t. Been there, done that. Had the end product thrown out | because of Adobe's licensing terms. I also met one of the | people responsible for the tooling I had to suffer through. I | have their address, but they apologized, and explained the | internal politics at the time; so I've chilled on the whole | _crushing their genitalia with a large wrench_ bit. | | Long story short: doable, _but Do Not Follow. | This is not a place of honor. No great deed was once | commemorated here That which remains is repulsive to | us, in our time, as it will be in yours. | | Seriously. If I could fill this post with spikes and sick | faces, I would. Vvvvvvvvvvvvvvvvvvvvvvvvvvvvv | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | | XFA was the dream of madmen, and sadists, that decent men | thought they could wrangle some positive utility out of. They | were wrong. | | The trefoil is not an angel. The weird ring things are | symbols for infectious waste._ | davedx wrote: | It depends. There are PDFs with rasterized images of text (like | in the article, when it's a scan or photo of a document), then | there are PDFs with vector positioned text runs (when it's | usually a result of some digital process). The latter are way | easier to process than the former. | codeulike wrote: | these are just photos embedded in a PDF, which actually isn't | that bad an idea, because it lets you scan multiple pages and | join them together as a 'document' | | (not sure if the documents in OP had several pages, but if | you've scanned/photographed a multi-page document, PDF is not | that bad of a solution) | SilverCode wrote: | A better option would be to use the TIFF format. You can use | it as a container format to store lossless and lossy image | formats, and handles multiple images in a single container. | | It was the standard for scanners until PDF seemed to dominate | the scene. | adzm wrote: | Except who knows if your application that supports TIFF | files actually supports the features you want (multiple | images, the compression format, etc) | MichaelZuo wrote: | Reminds me of USB-C. | hunter2_ wrote: | > It was the standard for scanners until PDF seemed to | dominate the scene. | | Probably because it's much easier (for average users with | few tools and skills) to print a PDF than to print any sort | of non-page-based (e.g., image) file format and have the | resulting sheet of paper match the scanned sheet of paper | in terms of scale, orientation, position -- assuming both | sheets are the same dimensions. Essentially using the file | as an intermediary for physical copying of standard paper | documents. | rqtwteye wrote: | I can buy the printing argument. The problem with PDF is | that this print-optimized is used more and more for | purposes where it will never get printed. For example | most manuals will never get printed but they are | published in PDF format which is a PITA to use on a phone | and hard to search. | gus_massa wrote: | I'm a teacher in the first year of the university. During the | remote classes in the pandemic, we made almost mandatory to | upload the photos of the take homes and questions using | camscaner [1]. | | The student just download the app, and it fix the | orientation, rotation, bad light, contrast, and many other | horrible things that a jpg may have. In particular the | orientation and ordering multiple sheets. Also, Moodle has a | little more support for pdf than jpg [2]. | | I don't know how many three letter agencies are reading the | stream, but I'm happy that many three letter agencies | operative now have a better formation in algebra and | calculus. | | [1] https://www.camscanner.com/ | | [2] It depends on how many optional packages your sysadmin | installed. | chrisfinazzo wrote: | It's old, and sometimes things don't come out right, but this | is one way out of that hornet's nest. | | https://tabula.technology | | There's also a CLI if that is more to your liking. If that | doesn't do it, there's always the brute-force option of | scripting in your language of choice to pull the data out. | anigbrowl wrote: | Because PDF shows you a page on screen that _will_ look the | same if you print it out, and print layouts have been optimized | for reading convenience over centuries. And if you give someone | with no technical expertise a pdf file, it 's virtually certain | that they're going to be able to open it because some kind of | viewer is built into most operating systems. | | You're totally right about PDF being a massive pain in the butt | for any other purpose, but unless you have an alternative that | handles the basic use case at least as well and other use cases | way better, PDF is here to stay. | snvzz wrote: | Not providing CSV is at the level of criminal negligence. | clipper_janosch wrote: | What an exceptional story. You are a legend. | throwaway81523 wrote: | I've done stuff like this semi manually. Use pdftotext to get the | text tables out of the pdf, eyeball it and massage with emacs | keyboard macros, and in some cases python scripts. It's not that | big a deal but it is somewhat ad hoc. | | I know that OCR software is able to read stuff like magazine | articles and figure out column layout, embedded charts, etc. It's | weird if is nothing to do that with a pdf. Maybe I'll look around | or see if I can hack up something. | infinityio wrote: | unfortunately in this case the text content was handwritten, | not computer-generated | harvey9 wrote: | This is some compelling writing. I know this has real life | implications for real people so I hope it's not in poor taste to | say it would make a good movie. | cwkoss wrote: | I agree, but still needs an ending! Will this be a story of | triumph or tragedy? | crazygringo wrote: | First of all, what a fantastic and inspiring read. | | But, I'm left greatly confused -- the article never states | whether this changed the result. | | It says that halfway through counting Obi was in the lead, but | nothing about when finished counting. | | And when I look at the spreadsheet, the last row (#3380) appears | to be the totals, which lists: APC LP PDP | NNPP 149014 85748 329030 8305 | | Which shows LP (Obi) in third place, just like the official | results. | | So what point is the article trying to make at the end of the | day? Or have I misunderstood the numbers? | error503 wrote: | I collected all the _crosschecked CSVs and got: | LP PDP APC NNPP 4731127 4555334 5928825 | 1019045 | | Obi seems to make second place here, but far from first. | | https://i.imgur.com/UaZbXz6.png | karagenit wrote: | I totaled up the results from only the "crosschecked" CSV | files, here's what I saw: APC: 5928825 | LP: 4731127 PDP: 4555334 NNPP: 1019045 | | I tried to manually verify about a dozen rows myself, half were | so blurry/low res they were illegible but the ones that were | legible were all correct. | | And for the "unsure" CSVs: APC: 1308067 | LP: 578482 PDP: 736183 NNPP: 513245 | | Also checked about a dozen, and all but one of them were wildly | inaccurate so I wouldn't trust these much. | sd9 wrote: | Those are the results for just one state, Adamawa. | | However, like you I don't know what the overall results are; I | agree that the article could make this clearer. | crazygringo wrote: | Oh thanks for clarifying. Turns out the link to the folder | for _all_ the states is here: | | https://drive.google.com/drive/folders/173oHgms6wYy5WKz_i3Lh. | .. | | But there doesn't appear to be any file that calculates the | nationwide totals. | | It just seems like such a strange omission but I'm on mobile | and can't add up the numbers from across a ton of different | files myself. | didgetmaster wrote: | I downloaded all the .CSV files from that site and quickly | loaded them into a table. It just took a couple minutes, | but I didn't stop to verify that there were not duplicate | rows across the various files. | | When I added up the totals, I got: APC - 7,225,399 LP - | 5,286,181 PDP - 5,285,900 NNPP - 1,529,575 | | Note: I was using a beta version of a new database tool I | created to do this. | londons_explore wrote: | The votes surprise me... In many regions one party gets 90+% of | the vote. | | Assuming the numbers are correct, then it suggests that most | people are easily swayed by their local peers. | | Is that common in say the USA? | muyuu wrote: | It happens in the US too. Tribalism and ideological clustering | are so similar, they are being used interchangeably these days. | But in some traditional countries there are literal clans and | tribes voting in blocks. | anigbrowl wrote: | Yep, bloc voting can be habitual or strategic. There's a town | in Northern California where the majority of the seats on the | council is held by people who all happen to attend the same | megachurch. | mmmuhd wrote: | Exactly! and this mostly happened in the regions where the OP's | preferred candidate won. This is clear scam. | crazygringo wrote: | > _it suggests that most people are easily swayed by their | local peers._ | | That feels like a particularly uncharitable interpretation to | me. | | I think it's more along the lines of that parties and their | policies have very different impacts on different regions. So | it makes sense to vote on what is beneficial to your region, | and a lot of people will agree on that. | | So it's not about susceptibility to being "swayed", but genuine | policy affecting regions differently. | orf wrote: | Fantastic story! Did the results get used in a claim? | seventytwo wrote: | Wow, this was a fantastic read! | | I have no idea what's going on in Nigeria, but I hope the truth | (whatever it is) will prevail! | vincheezel wrote: | I hope for (but do not expect) a positive outcome | blntechie wrote: | What was the final result numbers from the transcription? | toyg wrote: | They're probably going to be similar to the 14k sample he | tweeted: a solidified Labour Party getting 50-55% of the votes, | and the establishment candidates splitting the rest. | churchill wrote: | - | churchill wrote: | - | mmmuhd wrote: | David Hundeyin is a deceitful, lying criminal, so don't bring | his "Content" as any kind of evidence. | | https://www.icirnigeria.org/controversy-as-oxford- | terminates... | [deleted] | pxc wrote: | @dang are these '-' comments an attempt to evade showdead? ___________________________________________________________________ (page generated 2023-03-23 23:00 UTC)