[HN Gopher] Make the "semantic web" web 3.0 again - with the hel... ___________________________________________________________________ Make the "semantic web" web 3.0 again - with the help of SQLite Author : sekao Score : 81 points Date : 2022-01-11 20:47 UTC (2 hours ago) (HTM) web link (ansiwave.net) (TXT) w3m dump (ansiwave.net) | sharperguy wrote: | I wonder if combining this idea with some kind of | microtransactional currency such as the bitcoin Lightning Network | or even a simple Chaumian e-cash system (1) would help to get | around the issue of requiring clickbait, advertising and SEO with | every single piece of data. | | Would be great if providers could offer data in raw form without | the overhead of all the gunk that gets them paid. | | 1. https://en.wikipedia.org/wiki/Ecash | netcan wrote: | Whether or not it has legs, at least this is an interesting idea. | echelon wrote: | What a lot of folks don't realize is that the Semantic Web was | poised to be a P2P and distributed web. Your forum post would be | marked up in a schema that other client-side "forum software" | could import and understand. You could sign your comments, share | them, grow your network in a distributed fashion. For all kinds | of applications. Save recipes in a catalog, aggregate contacts, | you name it. | | Ontologies were centrally published (and had URLs when not - | "URIs/URNs are cool"), so it was easy to understand data models. | The entity name was the location was the definition. Ridiculously | clever. | | Furthermore, HTML was headed back to its "markup" / "document" | roots. It focused around meaning and information conveyance, | where applications could be layered on top. Almost more like | JSON, but universally accessible and non-proprietary, and with a | built in UI for structured traversal. | | Remember CSS Zen Garden? That was from a time where documents | were treated as information, not thick web applications, and the | CSS and Javascript were an ethereal cloak. The Semantic Web folks | concurrently worked on making it so that HTML wasn't just "a soup | of tags for layout", so that it wasn't just browsers that would | understand and present it. RSS was one such first step. People | were starting to mark up a lot of other things. Authorship and | consumption tools were starting to arise. | | The reason this grand utopia didn't happen was that this wave of | innovation coincided with the rise of VC-fueled tech startups. | Google, Facebook. The walled gardens. As more people got on the | internet (it was previously just us nerds running Linux, IRC, and | Bittorrent), focus shifted and concentrated into the platforms. | Due to the ease of Facebook and the fact that your non-tech | friends were there, people not only stopped publishing, but they | stopped innovating in this space entirely. There are a few | holdouts, but it's nothing like it once was. (No claims of "you | can still do this" will bring back the palpable energy of that | day.) | | Google later delivered HTML5, which "saved us" from XHTML's | strictness. Unfortunately this also strongly deemphasized the | semantic layer and made people think of HTML as more of a GUI / | Application design language. If we'd exchanged schemas and | semantic data instead, we could have written desktop apps and | sharable browser extensions to parse the documents. Natively | save, bookmark, index, and share. But now we have SPAs and React. | | It's also worth mentioning that semantic data would have made the | search problem easier and more accessible. If you could trust the | author (through signing), then you could quickly build a | searchable database of facts and articles. There was benefit for | Google in having this problem remain hard. Only they had the | infrastructure and wherewithal to deal with the unstructured mess | and web of spammers. And there's a lot of money in that moat. | | In abandoning the Semantic Web, we found a local optima. It | worked out great for a handful of billionaires and many, many | shareholders and early engineers. It was indeed faster and easier | to build for the more constrained sandboxiness of platforms, and | it probably got more people online faster. But it's a far less | robust system that falls well short of the vision we once had. | NetOpWibby wrote: | Wow, I had no idea, bookmarking your comment. | hobofan wrote: | > The entity name was the location was the definition. | | While that concept sounds cool in theory, in practice it was | and is a disaster. In combination with the big degree of | centralization and little versioning mechanisms you have to | trust the publisher to not alter the semantics, and also hope | that they stay online forever or your semantics vanish. | | When I first learned about the semantic web, I was very hyped | on it, but that quickly subsided once I tried actually querying | the ontologies and having to see that most of them yield a 404. | | I'm still very hopeful for semantic data (and happy to be able | to work on a product leveraging it), but I think for an open | semantic web there is a lot of work that needs to go into | tooling to make it succeed. | mftb wrote: | I agree with pretty much everything you said, except the part | about the "VC-fueled startups". Google and fb were once | startups, they were just earlier and Google in particular was | smart enough to see the future. As part of a multi-faceted | effort (including for instance, Chrome and gmail), they saw the | need to head off the Web 3.0 standards, delivering us instead | the web we have today. I wish I could have seen things as | clearly then. | | In the end though I'm not sure it ever would have been any | different. People want it "now" and they want it "convenient". | zozbot234 wrote: | There's a standard XML serialization of HTML5 that supports all | the features previously associated with XHTML. Additionally, | RDF data can be exchanged as JSON via JSON-LD. There's no | reason why a typical SPA app could not be built to query RDF- | serving endpoints. | | "Marking up forum posts" is something that's getting quite a | bit of traction nowadays via specifications like | ActivityStreams (with its "push" extension ActivityPub now | powering the 'Fediverse') and WebMention. | recursivedoubts wrote: | Humans, as of now (and as far as I'm aware, being outside the AI | labs at the big tech companies and DARPA) have agency, and so are | in a unique position to take advantage of the uniform interface | of REST/the web in a flexible manner. I wrote an article about | this on the intercooler.js blog, entitled "HATEOAS is for | Humans": | | https://intercoolerjs.org/2016/05/08/hatoeas-is-for-humans.h... | | The idea that metadata can be provided and utilized in a similar | manner doesn't strike me as realistic. If it is code consuming | the metadata, the flexibility of the uniform interface is wasted. | If it is a human consuming the metadata, they want something nice | like HTML. | | For code, why not just a structured and standardized JSON API? | | This appears to be what we have settled on, and I don't see any | big advantage extending REST-ful web concepts on top of it. The | machines just ignore all that meta-data crap. | netcan wrote: | >> why not just a structured and standardized JSON API? | | So in this version of the idea... because structuring data | requires work. Unstandardized data exists already. Some of it | is already SQLITE. A lot of the rest is in other SQLs, and that | might be a smaller bridge. | | Author claims (if I'm understanding correctly) that a static | website could easily query sqlites over HTTP, and bam, web 3.0. | | Honestly, it's hard for me to think/discuss these ideas without | examples, even if contrived. What kind of websites would be | built this way? What data will they be querying? | | A web app that uses photos and address books on the users | phone? An alternative UI for news.yc? | luhn wrote: | The author seems to assume that everybody is using SQLite, but | SQLite for a production database is an extremely niche choice. | Attempting to expose more popular options like PostgreSQL or | MySQL as SQLite would be extremely difficult because SQLite only | supports a subset of SQL, whereas PostgreSQL and MySQL both | implement their unique superset (for the most part) of SQL. | | But it doesn't matter. The API doesn't matter. Web 3.0 was never | about APIs, it was about _data_. A standardized API is only | useful if it outputs standardized data. Having a bunch of bespoke | SQLite tables scattered across the web gets us no closer to the | ideal of Web 3.0. | sekao wrote: | My point was not that people are using SQLite in prod | everywhere; read that paragraph in more of a speculative voice, | not a statement of fact about the present. At any rate, i do | think the range request technique makes SQLite more practical | to use in database-driven apps that normally would've opted for | a traditional db like postgres (though there is more work to be | done to make this technique fast when doing complex | queries...lots of joins are no bueno right now). | Closi wrote: | SQLite is the most used database engine in the world, so I | wouldn't call it niche. In fact, by some estimates, it is | probably used more than all other database engines combined. | | The only difference is that it is usually run locally (compared | to Postgres and your other examples), but something doesn't | have to run remotely to be considered running in production :) | luhn wrote: | Yes, when I said "production database" I meant a database for | a web application. My iPhone running SQLite doesn't relate to | Web 3.0. | Closi wrote: | > Yes, when I said "production database" I meant a database | for a web application | | I'm not sure that's what the author of the article means | though, at least in my interpretation. | | I assume when they say "everyone is already using it" I | assumed that they meant literally everyone is using it on | their phones and PCs every day, not that everyone is using | it to develop production web applications (because very few | people develop production web applications in the grand | scheme of things!). | | I presume they mean that it is one of (if not the) most | common databases in existence in the wild, and it's | interesting that it has this property of being able to be | remotely read with surprisingly little overhead (without | the need to implement an entirely bespoke database to be | read in this way). | luhn wrote: | I'm not sure how to parse what the author is saying | besides you should be exposing your database directly, | which is apparently SQLite. | | > In the process, it demonstrated a new kind of web app | whose entire database was exposed and queryable from the | outside. | | > The data needs to be exposed in its original form; any | additional translation step will ensure that most people | won't bother. | netcan wrote: | I think he does mean web apps/sites, at least in large | part. He is talking about implementing of web 3.0, after | all. OTOH, I suppose there's no reason why web has to | apply primarily to things that are currently webstuff. | You both make good points. | dfabulich wrote: | I think you misunderstood the author. | | "The data needs to be exposed _in its original form_ ; | any additional translation step will ensure that most | people won't bother. The beauty of this technique is that | you are _already_ using SQLite because it 's such a | powerful database; with no additional work, you can throw | it on a static file server and others can easily query it | over HTTP." | | The author believes (IMO wrongly) that there's lots of | web app data that can be exposed via SQLite-over-HTTP | without translating it into SQLite, because it's already | in SQLite. | | The author is saying that since lots of web apps use | SQLite for their production database, they can easily | "throw" their SQLite DB onto the web. But, in that case, | you're out of luck if you use Postgres, MySQL, Oracle, MS | SQL Server, or any of the popular key-value datastores | like Mongo, Redis, or Elasticsearch. | root_axis wrote: | sqlite is actually quite robust in a production web | application environment, I have used it as a database in | several production applications over the years including | one that serviced 600k MAUs without issue. if your | application is very write heavy or you're FAANG scale it | could present a problem, but IMO sqlite is probably the | best bang for your buck solution for the workload of most | websites and applications. | Groxx wrote: | tbh I wish it were used more. it's much cheaper to run | and just as fast as mysql (sometimes much faster) on your | average wordpress blog or equivalent. you don't need to | handle _thousands_ of concurrent writes, only maybe 5 max | at peak... and queueing them for tens of milliseconds is | totally fine. as long as you 're not writing horrifically | inefficient insert operations, you absolutely won't | notice until you're under ridiculously high load for most | sites. | luhn wrote: | Yeah, I've heard the argument before that SQLite is just | fine as a production database. Not arguing against that, | just saying that it's not a common choice. | DangitBobby wrote: | Well, having to write unique SQL per site is much better than | having to write unique scrapers per site. | contravariant wrote: | Wouldn't you want to use a subset of SQL in an API? Why use a | unique superset that differs per webpage? | bokchoi wrote: | I never got on the semantic web train, but a translation layer | does allow you to make underlying schema changes. | | I poked around the ANSIWAVE BBS and it looks fun! | 0xbadcafebee wrote: | Among the 30-odd technologies that make up the Semantic Web[1] | (it never died, it's just a collection of tech, lots of | organizations use it daily) are graph databases[2]. Graph | databases are necessary to implement semantic web databases. | | SQLite is not a graph database. Even if you used SQLite to | _implement_ a graph database, it would not solve any significant | problems of the semantic web, such as access to data, taxonomies, | ontologies, lexicons, tagging, user interfaces to semantic data | management, etc. | | It's a really odd suggestion that you would just copy around a | database or leave it on the internet for people to copy from. For | the BBS mentioned here, that might actually be _illegal_ , as it | might contain PII, and on other sites possibly PHI. Many | countries now have laws that require user data to remain in- | country. Besides the challenges of just organizing data | semantically, there still needs to be work done on data security | controls to prevent leaking sensitive information. | | The funny thing is, that isn't even hard to do with the semantic | web. You classify the data that needs protecting and build | functions and queries to match. You can tie that data to a unique | ID so that people can "own" their data wherever it goes, and sign | it with a user's digital certificate which can also expire. | | But all of that (afaik) doesn't exist yet. Everyone is more | concerned with blockchains and SQL, either because the fancy new | tech is sexier, or the old boring tech doesn't require any work | to implement. The Semantic Web never caught on because it's | really fucking hard to get right. No companies are investing in | making it easier. Maybe in 20 years somebody will get bored | enough over a holiday to make a simple website creation tool that | implicitly creates semantic web sites that are easy to reason | about. It'll probably be a WordPress plugin. | | [1] https://en.wikipedia.org/wiki/Semantic_Web [2] | https://graphdb.ontotext.com/documentation/enterprise/introd... | nescioquid wrote: | > The Semantic Web never caught on because it's really fucking | hard to get right. No companies are investing in making it | easier. | | I really appreciate this point. I had the opportunity to work | on an exploratory project with an experienced ontologist (yes, | you really need one of those, I think). The tools were | fascinating (reasoners quickly became necessary) but I had the | feeling that many of these tools were at a comparatively early | stage of maturity. | | Trying to explain to people how the system would work was a | challenge as it required a primer on theory and application -- | we glazed many eyes. The CTO wanted to know if we could use | blockchain somehow. Another group addressed a slice of the | problem with technologies already in use and that decided the | matter. | NetOpWibby wrote: | Thanks for the links! | Karrot_Kream wrote: | The semantic web failed to become widely popular because: | | 1. Graph databases on top of triple stores are a lot less | scalable than relational databases or key-value stores, and | this is how semantic data is meant to be stored/queried. | | 2. Data is valuable. Handing out data for free in a machine- | consumable way is both expensive (machines can request data | much more quickly than a human) and a recipe for copycats. The | incentives just aren't there. | | TBL's Solid project is about trying to separate semantic data | providers from the presentation layer and opening up the | possibility of payment from these data providers to try to | improve the incentives around semantic data sharing. | zozbot234 wrote: | > Graph databases are necessary to implement semantic web | databases. | | This just isn't true, on multiple levels. RDF is an | interoperability standard that does not per se depend on a | 'graph-like' data model - you can very much expose plain old | relational data via RDF, and this is quite intended. | Additionally, modern general-purpose RDBMS's support graph- | focused data models quite well, despite being built on | 'relational' principles - there's no need for special tech when | working with general-purpose graph models, unless you're doing | some sort of heavy-duty network analytics. | 0xbadcafebee wrote: | You're talking about extending a database design created 50 | years ago to work with models and methods that involve | significantly different operations and concepts. Let the | RDBMS die so we can make something that is much more powerful | and requires less fidgeting and squinting to work the way we | want. | | RDBMS were a niche research project for a decade before they | started to catch on in business apps. They've stayed around | forever because they're just functional enough to be | dangerous. But we've already hit the upper limits of both | reliability and performance years ago (remember NoSQL?) and | we just keep bolting on features because nobody wants to | leave them. The old designs and implementations are holding | us back. | smarx007 wrote: | RDF is a labeled multigraph data model with URI-based | predicates as edge labels, where each triple represents an | edge. You are right that relational data can be exposed in | RDF, just like CSV can be loaded into a graph DB. | sekao wrote: | > Graph databases are necessary to implement semantic web | databases. | | The online docs (and TBL himself) rarely mention of graph | databases, but obviously the idea is tied tightly to RDF. | Separating it from that implementation detail is part of the | point, though. Getting people to represent their data via an | additional format was never going to work. | | > For the BBS mentioned here, that might actually be illegal, | as it might contain PII | | Can't imagine the purpose you had in even making this point. In | theory, any arbitrary database exposed publicly could be | illegal to replicate due to copyright, PII laws, etc. But that | has nothing at all to do with a technical discussion of a | technique for exposing data. What a bizarre point to make. | | As an aside, I'm glad you removed the "Uh........." from the | beginning of your post. We're all making an effort to reduce | the typical HN snark in the comments, and there's always room | for improvement :D | xmly wrote: | I do not understand this conclusion: "Data on the web will only | be "semantic" if that is the default, and with this technique it | will be." | | Why would it be semantic? | pure_simplicity wrote: | They're saying: if it's not strongly incentivized, then it | won't happen. | | They don't specify in the conclusion what that incentive | structure looks like. Saying that it has to be the default is | very general. | firechickenbird wrote: | Isn't this Web 1.0 instead? You are only reading data, yeah ok | with sql, but you still can't modify it. And also there are | already very good standards like Rdf, Owl2, spraql, which are | more expressive than sql for consuming the info | jsight wrote: | Is there really that much web safely exposable data in sqlite for | this to make sense? I'm not really seeing how this is obviously | better than the metadata ideas that preceded it. | rossdavidh wrote: | Some: weather, ratings, topography, dictionaries and | encyclopedias, sports scores, market prices, some other stuff. | All public knowledge, but not necessarily publicly available | (easily) in raw form. | lostmsu wrote: | But doesn't it have to be immutable for the proposal to work? | vorpalhex wrote: | No, as long as the data model is stable you can add new | rows. | | You might want some kind of versioning for messy column | changes, particularly removals. | fleddr wrote: | The semantic web is not a technical problem, it's an incentive | problem. | | RSS can be considered a primitive separation of data and UI, yet | was killed everywhere. When you hand over your data to the world, | you lose all control of it. Monetization becomes impossible and | you leave the door wide open for any competitor to destroy you. | | That pretty much limits the idea to the "common goods" like | Wikipedia and perhaps the academic world. | | Even something silly as a semantic recipe for cooking is | controversial. Somebody built a recipe scraping app and got a | massive backlash from food bloggers. Their ad-infested 7000 word | lectures intermixed with a recipe is their business model. | | Unfortunately, we have very little common good data, that is free | from personal or commercial interests. You can think of a million | formats and databases but it won't take off without the right | incentives. | onion2k wrote: | _Even something silly as a semantic recipe for cooking is | controversial. Somebody built a recipe scraping app and got a | massive backlash from food bloggers. Their ad-infested 7000 | word lectures intermixed with a recipe is their business | model._ | | Taking someone else's content and republishing it without | permission isn't cool, even if you wrap it in a nice machine | readable format. | Micoloth wrote: | I'm more and more seeing that this is true. Still, it is sad. | | The question isn't even, what can one do, because obviously | nobody can change how incentives works in a given society. | | The question is: is there a timeline in which the right | incentives (to share data) start being enforced? How would that | play out? ___________________________________________________________________ (page generated 2022-01-11 23:00 UTC)