[HN Gopher] Show HN: Full text search on 630M US court cases ___________________________________________________________________ Show HN: Full text search on 630M US court cases Author : richardbarosky Score : 229 points Date : 2022-02-19 19:45 UTC (3 hours ago) (HTM) web link (www.judyrecords.com) (TXT) w3m dump (www.judyrecords.com) | drewmol wrote: | I recently had some criminal charges expunged, and I notice they | show up here. Is there any way to request removal of court | records which are no longer publicly available from the | originating court? | richardbarosky wrote: | This is a possibility that there aren't any great solutions for | currently. Can you message me on reddit with the link to check? | jka wrote: | I'm not a lawyer: | | In the absence of a reporting mechanism for issues like this, | I'd suggest at least a notice / message alongside results to | indicate that they may not reflect the current state of | official and amended records. | | (I think you may be wise to take this issue fairly seriously; | there's a risk of people considering the search engine to be | an authority in itself -- which, to be fair, is already a | risk for any search engine, but since this one is more | domain-focused, it's possible that some users could | overdevelop a sense that the results are accurate and | complete) | richardbarosky wrote: | This is stated in simple language on the terms page, which | is linked at the top/middle of every page. You have to | decide between putting the same text on every page vs. a | high visibility place vs. a low visibility place. I opted | for 2nd to make sure it's clear. | jka wrote: | Do most people read and comprehend terms pages before | using the information they discover from search engines? | (I don't know) | drewmol wrote: | I tried you hn handle on Reddit it says user does not exist. | richardbarosky wrote: | aoeusnth48 | alangibson wrote: | This site will be the first stop for anyone wanting to harass | another person online. Some times a little friction is a good | thing. | | I love projects like these, but they're the digital equivalent of | "dual use technologies". They can be used for good or evil. | | That said, nice work. | vintermann wrote: | On the other hand, powerful people who wanted to harass you or | hurt you have had access like this for a long time. | | It's how I feel about facial recognition technology or other | ML-based technology too. The worst people who could ever have | access to it, already had access to it. Giving everyone access | to it is just leveling the field. | duped wrote: | I tried some rather specific queries of things I know to should | return some records and it was fairly useless, so I'm not | terribly worried. | | Just anecdotally, I have a fairly uncommon last name but common | first name, I know what states/counties I have appeared in | court in and couldn't find any of the records. If you search | something like <name> <county> <state> the results are | overloaded with <county> <state>, for example. | iqanq wrote: | >This site will be the first stop for anyone wanting to harass | another person online. Some times a little friction is a good | thing. | | Precisely I was thinking of how much fun we'll be having in | efnet with this. | richardbarosky wrote: | I think broadly the same tradeoffs exist for any search | sysetm, like Google or PACER for example. | bryanrasmussen wrote: | Given that one third of Americans have criminal records of one | sort or another, so that somebody almost certainly has a | criminal in their family or near circle of friends, I suppose | criminality is about the same as finding out somebody watches | porn. | | on edit: actually one third is probably overstating but close. | rmbyrro wrote: | Yeah, the only missing piece for _fulltext_ harassment is a | "Google alert" for particular keywords. Put the names you wanna | track and receive a delightful alert in your inbox with rocks | to throw over other people's roof. | | EDIT: the tech is great, but I think there should be a record | of who is accessing the data, for what purpose, terms for how | it can be used in a civil way, and means to go after misuse. | alangibson wrote: | How is harassment as a service not a thing yet? | | You get a "Google alert" for your target. The service | presents you with several buttons: | | 1. Send an AI written email 2. Post a link to the new info on | their Facebook page 3. Tweet an image macro with the | incriminating text embedded @ them | inetknght wrote: | > _How is harassment as a service not a thing yet?_ | | What makes you think it isn't? | sockpuppet69 wrote: | rmbyrro wrote: | It is a thing, but making it so easy to find and access | court documents mentioning someone's name will add to the | pile of rocks malevolent people can throw at anyone. | thr0wawayf00 wrote: | > I love projects like these, but they're the digital | equivalent of "dual use technologies". They can be used for | good or evil. | | Isn't pretty much every technology "dual use"? Just look at | social media. You need a platform that gives you the ability to | harass someone in order to actually do it. | | > Some times a little friction is a good thing. | | We as a market repeatedly justify the frictionless experience | of being spied on for ads in ways that we have little to no | control over, but we're gonna deny ourselves the frictionless | experience of being able to see public records because we're | worried about our privacy? | ghaff wrote: | There's a whole lot of information that the collective "we" | decided to make public for various reasons. But those decisions | making things public were in the context of the information | being in some dusty town, county, or state office somewhere. | | With more and more of that information being digital, we've | more or less punted of the question whether all that | information should still be public. Overall, more transparency | is probably good but, as you say, it's not an unalloyed good as | most of this information will live forever and be cheap/easy to | access. | loxias wrote: | Fantastic. Love it. Wish I could download the whole 630M DB, not | just 700K cases from Texas. | | I especially love the interface. It's light and fast. Not | unnecessarily burdened by JavaScript. Bravo to that. | richardbarosky wrote: | thank you! | Simon_O_Rourke wrote: | Searched my former boss on this. Hoooo doggy, I knew he was up to | some questionable financial practices, but it looks like it | caught up with him. | channel_t wrote: | Wow I just found out that a lot of distant family members on the | opposite side of the country who I've never met are really bad | drivers. Found one of my own moving violations in there too. | dheera wrote: | Damn, even traffic citations in there. Wow. | btdmaster wrote: | Just an FYI -- you probably need to declare the use of Google | Analytics explicitly in your terms. (Although my personal | preference is something that does not require consent, like | Matomo or Plausible Analytics :) | ejb999 wrote: | why would that be? I don't think I have ever seen a site that | disclosed they are using GA? | | FWIW: I also prefer Plausible, and have all GA traffic blocked | in my hosts file | btdmaster wrote: | Since it collects personally identifiable information (at | least IP addresses, but it's not clear where it stops) this | requires special treatment under GDPR: | https://en.wikipedia.org/wiki/Google_Analytics#Privacy | mostlystatic wrote: | It's much more limited in what's covered, but when I had some | questions around VAT I found the website of the British and Irish | Legal Information Institute really helpful: | https://www.bailii.org/ | | It's noindex, so it would normally be super hard to find the | cases if you don't search on the BAILII site directly. | cryptnotic wrote: | Today I learned that 20 years ago I was a defendant in an | unlawful detainer (eviction) lawsuit regarding an apartment I | shared in college. I had moved out after graduation. Apparently | my roommate stopped paying the rent and the landlord sued both of | us. I was never served and didn't know about the case until now. | wolverine876 wrote: | Who made this site? How is it funded? They don't reveal | themselves afaict. Why should I trust it? | richardbarosky wrote: | The site is meant to be an index, and you should verify | information from the source. | flatiron wrote: | What wouldn't you trust? It's simply indexing public info. | bradknowles wrote: | You should be able to do an exact match search here. Trying to | use double quotes on my name turns up a boatload of hits, but | most of them appear to be cases where my first name is found | somewhere on the page, and somewhere else my last name is found | somewhere on the page. | | It should also be possible to limit the search by city, state, | and or region, as well as by timeframe. | | Not very useful. | magicjosh wrote: | Here's Steve Jobs' speeding ticket: | https://www.judyrecords.com/record/vde11sdzw25ac | sva_ wrote: | Was trying to find speeding tickets of John von Neumann, but in | vain. It would be nice if one could limit search by years. | hervature wrote: | Apparently importing a Jaguar through Canada went horribly | wrong for him: | https://www.judyrecords.com/record/0vctgni5684d | sva_ wrote: | _> Argued and Submitted June 3, 1981._ | | John von Neumann died in 1957. The name is a bit generic, | so many results show up. Hence I wished there was a way to | limit search to a range of years. | hervature wrote: | Good call, now I'm embarrassed. I should've known that. | Funny how the mind works. I knew he died in his 50's and | was involved in the Manhattan project but somehow was | content lumping him in with all the other scientists from | Operation Paperclip and using loose math that 1981 was | possible. | jonbraun wrote: | "One does not have to be a Richard Feynman to figure out that | 200 tons is 100% greater than 100 tons." | https://www.judyrecords.com/record/dhuql2nm6942 | richardbarosky wrote: | hmmm, middle initial checks out. though it's possible it's | another steve. | ChrisMarshallNY wrote: | TIL that a lot of sad MFers share my name... | | This tool is awesome, but, in knucklehead hands, could be fairly | awful. | airstrike wrote: | This seems pretty good at first glance but there's significant | room for improvement. Since this is HN, allow me to nitpick... | | - "630M" is a big number, sure, but I don't have a sense for what | % of total court cases it corresponds to. Is it closer to 10% or | 90%? And either way, which ones are included vs. excluded? What | was the criteria used? Accessibility, date, costs? | | - I get the artistic view behind the choice of typography but the | font is just too large. I find myself having to scroll to get | just as far as the 5th result. Information density is good in | search engines | | - The results consist of two pieces: the name of the court | (followed by "record", which is unnecessary) and a short snippet, | but not the actual name of the case... which is an interesting | choice given that the name of the case is stored in a database | field as evidenced by the fact that it is in the <title> tag of | any detail view | | - Also I also think the snippets are too short. Together with the | previous point, this site is basically forcing me to click on | each potential match to see if it is what I wanted or not | | - The URLs are... interesting. Searching for anything takes you | to "https://www.judyrecords.com/getSearchResults/?page=1" which | does not identify your search. Somehow this is using GET but not | storing the form input in the URL but locally somehow... so | searching for "foo" in one tab, "bar" in a different tab, and | hitting refresh on your "foo" tab will then show "bar" results | there. Which is not only "Not Cool", but seems actually _harder_ | to accomplish than a straight up form using GET | | - And then the actual results have URLs like | "https://www.judyrecords.com/record/qxemfajbcae3". I'd be fine | with a slug, really, but in 2022 I expect URLs to be API-like | | - I can't search for specific cases, e.g. "paramount | communications, inc. v. qvc network, inc" returns a bunch of | results, none of which are the actual case I'm looking for which | is a hugely influential precedent | ghaff wrote: | I note that this isn't just court cases. I have a long ago | (paid) traffic ticket in there--well, not the ticket but a | record pointing to a no longer existing ticket. (Maybe that's | technically a court case though.) Something I wrote is also in | a footnote to a patent filing. | richardbarosky wrote: | Valid criticisms, thanks for pointing them out as areas of | improvement. Good question about the % of total cases though I | think there are some estimates on that. My guess would maybe be | 100M+ cases per year. | skilled wrote: | Page 1 of 78 total cases for: wikileaks | stjohnswarts wrote: | Not sure how good this on a "regular citizen" level. I tried | several drug/alcohol related incidents that I knew about and | nothing came up. | busymom0 wrote: | Mind sharing info on server, backend, costs etc? | richardbarosky wrote: | Replied to this comment here with some additional info: | https://news.ycombinator.com/item?id=30399881#unv_30400160 | nabla9 wrote: | 603 total cases for: emacs | | 260 total cases for: "mind control" | | 768 total cases for: "donald j. trump" | | State of Minnesota vs Steven Captain America Rogers | https://www.judyrecords.com/record/vfvd30smme78f | btdmaster wrote: | > mind control | | I love it! (Is witchcraft constitutionally protected?!) | codechad wrote: | This is amazing. Can you share any info on how you were able to | compile so much info from different sources? In my limited | experience of hunting for legal filings, it seemed like every | court had its own system, with nothing standardized or | programmatic. | | Thanks! | richardbarosky wrote: | The search uses elasticsearch 7 for full text search. It's been | extremely fast and worked very well. You're right court data is | scattered across many different systems and needs to be | aggregated, which is a difficult process. | tmikaeld wrote: | How much ram does that use up? What's the latency? Is it | sharded? Is it a cluster? So many questions | richardbarosky wrote: | There are 2 search boxes going. One for storing the search | index without source and another which stores the source, | which is only used for highlighting. Searches usually take | under 200ms and SRP and individual pages usually take less | than 20ms. The 2 ES nodes are not formally part of a single | cluster due to the index storage difference. Another box | uses a traditional LAMP setup. Feel free to send a message | on reddit if interested in more detail. | kingcharles wrote: | Are you using freelaw's code to scrape all the different | servers? Why are there no contact details on the site? I | don't understand the mystery and black ops nature of this | thing. It feels like there is some sort of conspiracy here | that I've yet to uncover! | richardbarosky wrote: | There are I think about 5 million opinions from that | project, yes. I wouldn't say it's blackops, feel free | contact me on reddit. | [deleted] | codechad wrote: | agumonkey wrote: | oh these includes patents, weird | lol768 wrote: | Yeah - is there no way to filter out patents? Bit frustrating. | hammock wrote: | This is unbelievable. It has speeding tickets. | trhway wrote: | That is great. Regular people access to the information is great | power equalizer. I had lost a small case - fine print and a lot | of undelivered promises - after 3 lawyers said I'd lose and won | it on appeal after finding in an online database (not available | anymore sadly) a similar precedent referring the law exactly for | my situation. According to yelp and case search the company I had | this case with was regularly taking people for a ride, and the | people very grudgingly paid hundreds to several thousands of | dollars a pop mostly because of the fine print, and I became the | first with winning case in that list. | richardbarosky wrote: | That's a great use case. Thank you for sharing! | throwaway-PII wrote: | The fact that this is free is mind boggling. Maybe four or five | years ago I had access to a commercial court search API which had | 850mn cases nationwide, and it cost a pretty penny. | toomuchtodo wrote: | Legal Scihub LexisNexis. | hbcondo714 wrote: | OP submitted this site in November 2020 with 400M cases[1]. Other | than the increase in cases, what else has changed? | | [1] https://news.ycombinator.com/item?id=25150702 | richardbarosky wrote: | Right, more cases primarily. The performance has been optimized | so the searches, search result pages, and individual pages load | significantly faster. Most searches load in under 200ms and | most pages including SRPs load in less than 20 ms. Search | syntax improvements (see info page for details). The search is | still not very granular and field-specific, but definitely an | area of improvement. | dang wrote: | Not as a criticism but just FEI (For Everyone's Information), | reposts are ok on HN after a year or so. This is in the FAQ: | https://news.ycombinator.com/newsfaq.html. | kyboren wrote: | This is... not great. It's crucial that these records be open to | public inspection. But instant full-text search of the entire | dockets of 630M cases feels wrong, invasive, and dangerous to me. | | It's yet another instance of panopticon surveillance now being | too cheap to meter. I think our society needs to come to grips | with this new reality and figure out what to do about it. | | Or are we all just cool with this? | sockpuppet69 wrote: | EvanAnderson wrote: | Powerful corporate and government actors have massive | surveillance and data warehousing capabilities that aren't | going away. At the very least, putting those powers into the | hands of the public helps to level the playing field. | | Society will have to change to accommodate the digital | panopticon. I don't see the digital panopticon going away, | though. | wolverine876 wrote: | > putting those powers into the hands of the public helps to | level the playing field | | Agreed, but ... | | > Powerful corporate and government actors have massive | surveillance and data warehousing capabilities that aren't | going away. | | To nitpick: They aren't going away as long as we spread that | message. It's not easy, but we can make them go away. People | do accomplish things and change the world - just compare | today's world with 500 years ago; all the differences the | result of people changing things. Defeatism is trendy, and | who benefits? (The status quo.) | EvanAnderson wrote: | > To nitpick: They aren't going away as long as we spread | that message. ... Defeatism is trendy, and who benefits? | | It's not defeatism-- it's just being realistic. I don't | believe there's any useful method to make government actors | comply with the law. I have an, admittedly US perspective, | but evidence the FBI under J. Edgar Hoover, the NSA and the | subsequent Church committee hearings, and Snowden's | disclosures as examples. The power afforded by mass | surveillance and data warehousing is too attractive not to | be abused. | [deleted] | codechad wrote: | There are public court records (criminal, civil), and there are | non-public court records (e.g. sealed - juvenile, divorce, | etc.) | | As far as I can tell, all of this data is of the public nature. | | While it may feel weird to type in someone's name and see their | history with regard to legal filings... that is the society we | live in: an open society. | | Aggregating a number of disconnected data sources for search I | think is absolutely a legitimate usage of the data. | lazide wrote: | FYI, I found a couple folks I know's divorce records. So I | wouldn't assume those hard and fast rules apply consistently. | codechad wrote: | Fair enough - in my state they are limited to parties | involved and their counsel. | | The public can still see the filing and result (when the | divorce was granted), but the actual documents are | restricted so as not to air all of one's dirty laundry | unnecessarily. | [deleted] | drewmol wrote: | I have some records that are sealed, but show up in this | database. So there are records that were once 'public' but | are no more, but this database makes them public again. | mmastrac wrote: | Don't lawyers already have access to case law like this? I feel | like this is not a new thing, but giving access to everyone is | novel. | | I could be wrong on my facts. | lazide wrote: | Generally you've had to pay for an expensive service (Lexus | nexus), or go to the courthouse yourself to pull the records. | Search was also a bit of a black art. | | So generally easy to hide in the noise. Here you can just put | in a name, and off you go. | SkittyDog wrote: | Lexis has the best search capabilities, but there are | dozens of cheap clones now that start at $10/month to | search these same records. | EvanAnderson wrote: | The public has access to most local court data in my state | (Ohio, US) thru websites run by the various local courts. A | state-level database for government use is, as far as I know, | still not actually available (though it has been in planning | and some phase of execution for 10+ years). | wolverine876 wrote: | > Don't lawyers already have access to case law like this? | | Yes, through expensive services like Westlaw and Lexis. | anonu wrote: | I couldn't find my name. And i know it should be in here. So | I'm not that worried yet... | vanusa wrote: | There's no escape. It's just a matter of time. | vasco wrote: | These records have always been available to people with money | to spend on a lawyer with a subscription. So what you're | complaining about is that normal people can also access the | information now. | alangibson wrote: | Nice false equivalence. | | Lawyer: duty-bound professional, is an officer of the court, | can be publicly disbarred, very expensive degree that needs | to be paid off | | Some guy on the internet with an axe to grind: ??? | dgfitz wrote: | I believe in California you only need to pass the bar to | become a lawyer, no expensive degree required. | Spooky23 wrote: | I'm sure both of the attorneys who have done an | apprenticeship are happy about that. | SkittyDog wrote: | You're all missing the point... ANYbody with a Lexis | Nexus subscription, or a Bloomberg terminal, or one of | those background check sites, already has this exact | capability. It's not new. | | You dont need to be a lawyer to access any of it... I | think the other poster simply meant that lawyers | generally have Lexis subscriptions, already. | | Also, the various court databases this site is searching | are ALREADY online and publicly available, and have been | for years. This is just providing a free, unified | interface with a fast search index. | ghaff wrote: | At some level I get the angst about typing someone's | name, especially if it's fairly unusual, and getting back | a whole lot of information about, in this case, mostly | legal-related stuff and in others past addresses, things | they've written etc. for free. (And, if you know | something about them you can probably sift the returns | somewhat effectively.) You may be able to find out a lot | about your date, your neighbor, etc. | | On the other hand, outside of casually checking out | someone, the reality is that this has long been available | for anyone want to spend a very few bucks to do so. | alangibson wrote: | > This is just providing a free, unified interface with a | fast search index. | | Yes, and that is a phase change difference. It's not a | trivial enhancement. | [deleted] | retrac wrote: | Quantity has a quality of its own. To use a similar example, | arrest and imprisonment records are public data in my | country. But you have to actually go to the courthouse and | fill out some paperwork and/or hire a lawyer to do it for | you. | | This has consequences. For example, in some US states it | takes a few seconds for an employer to find out a candidate | was once arrested while drunk, or has a conviction for a | minor offense from 15 years ago. And employers do that sort | of search routinely, because it's free and easy. Only someone | being targeted for a specific background check gets that | treatment here, because it's not so easy. | | Same argument applies to, for example, reading the previous | divorce case for someone you're dating. Only a real weirdo | would do that here, in part because it involves time and | money. If it's freely available online, I do think it would | be a lot more common. | | I don't know whether it'd be better or worse to have such | information more accessible, but it can change things. | citizenkeen wrote: | > Only a real weirdo would do that here, in part because it | involves time and money. | | I think your parent's point is that money isn't an issue | for the rich. A billionaire doesn't care that it costs $150 | to find out, they don't care that it costs a $1,000 to find | out. So suddenly information becomes a class issue. Either | it should be available to nobody or everybody, money | shouldn't factor into it. | SkittyDog wrote: | I think you may be misunderstanding what this is... All of | these documents were ALREADY public records, and were ALREADY | available online. Most US courts have been publishing these | records online, for a while now. | | And they are ALREADY other websites/search products that | provide a unified search interface... Lexis Nexus is probably | the biggest/oldest, and I believe Bloomberg also has this | feature... There are dozens (if not hundreds) of cheap public | record search websites that charge $10/month for it, too. | | If you're surprised by all this, you haven't been paying | attention... For a few decades now. | zomglings wrote: | I don't see any problem with this. These cases are in the | public record, why should the public not have the ability to | search them for free without requiring access to expensive | legal indices? | mgdlbp wrote: | Seems closer to a form of | https://en.wikipedia.org/wiki/Sousveillance | oh_sigh wrote: | Sure, why not? It's not like anything embarrassing or things | you want kept secret should be in court proceedings | | https://www.judyrecords.com/record/vvfe9mivbec8c | cperciva wrote: | TIL that I'm cited in a _lot_ of patents. | andrewguenther wrote: | https://patents.google.com is great for this | [deleted] | fosshogg wrote: | Thankfully none of the (many) speeding tickets I got in my youth | are showing up. | tomrod wrote: | Neat. I'll add this to my sources on case law -- another one I've | come across is https://case.law/ | | Per my close friend, the value of these (or, why people subscribe | to LexisNexis) isn't solely the texts, but the cross referencing. | It would be really cool to see that get implemented (and no doubt | a non-trivial problem!). | | How do you source your case inputs, as it is bigger than PACER? | richardbarosky wrote: | CourtListener is a free source that does this very well for | high-level courts. (i.e., US Supreme Court, Federal Courts, | State Courts of Last Resort/State Supreme Courts). | | For that, you have to detect references of cases which is a | difficult problem itself, and CourtListener's search ranking | also takes into account the citation weight of certain cases. | This generally works well, but my understanding is that | sometimes a not-so-important case can end up having many | citations. Or if a case with many citations is overturned | completely or partially, these things complicate which cases | might be most relevant in search results too. | | The data source is provided for each case. In some cases, a | direct reference/link is provided. | supernova87a wrote: | I know there is some open source (?) effort to publish and give | access to court cases instead of having it behind a paid | subscription channeled through the federal court system. Does | anyone know how that's going? | | And also, are only the primary filings of the court and parties | available to be searched? What happens to depositions, evidence | records, etc. that are part of the case? Are those ever available | to the public? | richardbarosky wrote: | It sounds like you're referring to this: Open Courts Act of | 2021 | | Some commentary at these links: | | - https://free.law/pacer-facts | | - https://www.politico.com/magazine/story/2019/03/20/pacer- | cou... | | - https://abovethelaw.com/legal-innovation- | center/2021/03/11/t... | | - https://unicourt.com/blog/modernizing-pacer-realizing- | crimin... ___________________________________________________________________ (page generated 2022-02-19 23:00 UTC)