[HN Gopher] Make the "semantic web" web 3.0 again - with the hel...
       ___________________________________________________________________
        
       Make the "semantic web" web 3.0 again - with the help of SQLite
        
       Author : sekao
       Score  : 81 points
       Date   : 2022-01-11 20:47 UTC (2 hours ago)
        
 (HTM) web link (ansiwave.net)
 (TXT) w3m dump (ansiwave.net)
        
       | sharperguy wrote:
       | I wonder if combining this idea with some kind of
       | microtransactional currency such as the bitcoin Lightning Network
       | or even a simple Chaumian e-cash system (1) would help to get
       | around the issue of requiring clickbait, advertising and SEO with
       | every single piece of data.
       | 
       | Would be great if providers could offer data in raw form without
       | the overhead of all the gunk that gets them paid.
       | 
       | 1. https://en.wikipedia.org/wiki/Ecash
        
       | netcan wrote:
       | Whether or not it has legs, at least this is an interesting idea.
        
       | echelon wrote:
       | What a lot of folks don't realize is that the Semantic Web was
       | poised to be a P2P and distributed web. Your forum post would be
       | marked up in a schema that other client-side "forum software"
       | could import and understand. You could sign your comments, share
       | them, grow your network in a distributed fashion. For all kinds
       | of applications. Save recipes in a catalog, aggregate contacts,
       | you name it.
       | 
       | Ontologies were centrally published (and had URLs when not -
       | "URIs/URNs are cool"), so it was easy to understand data models.
       | The entity name was the location was the definition. Ridiculously
       | clever.
       | 
       | Furthermore, HTML was headed back to its "markup" / "document"
       | roots. It focused around meaning and information conveyance,
       | where applications could be layered on top. Almost more like
       | JSON, but universally accessible and non-proprietary, and with a
       | built in UI for structured traversal.
       | 
       | Remember CSS Zen Garden? That was from a time where documents
       | were treated as information, not thick web applications, and the
       | CSS and Javascript were an ethereal cloak. The Semantic Web folks
       | concurrently worked on making it so that HTML wasn't just "a soup
       | of tags for layout", so that it wasn't just browsers that would
       | understand and present it. RSS was one such first step. People
       | were starting to mark up a lot of other things. Authorship and
       | consumption tools were starting to arise.
       | 
       | The reason this grand utopia didn't happen was that this wave of
       | innovation coincided with the rise of VC-fueled tech startups.
       | Google, Facebook. The walled gardens. As more people got on the
       | internet (it was previously just us nerds running Linux, IRC, and
       | Bittorrent), focus shifted and concentrated into the platforms.
       | Due to the ease of Facebook and the fact that your non-tech
       | friends were there, people not only stopped publishing, but they
       | stopped innovating in this space entirely. There are a few
       | holdouts, but it's nothing like it once was. (No claims of "you
       | can still do this" will bring back the palpable energy of that
       | day.)
       | 
       | Google later delivered HTML5, which "saved us" from XHTML's
       | strictness. Unfortunately this also strongly deemphasized the
       | semantic layer and made people think of HTML as more of a GUI /
       | Application design language. If we'd exchanged schemas and
       | semantic data instead, we could have written desktop apps and
       | sharable browser extensions to parse the documents. Natively
       | save, bookmark, index, and share. But now we have SPAs and React.
       | 
       | It's also worth mentioning that semantic data would have made the
       | search problem easier and more accessible. If you could trust the
       | author (through signing), then you could quickly build a
       | searchable database of facts and articles. There was benefit for
       | Google in having this problem remain hard. Only they had the
       | infrastructure and wherewithal to deal with the unstructured mess
       | and web of spammers. And there's a lot of money in that moat.
       | 
       | In abandoning the Semantic Web, we found a local optima. It
       | worked out great for a handful of billionaires and many, many
       | shareholders and early engineers. It was indeed faster and easier
       | to build for the more constrained sandboxiness of platforms, and
       | it probably got more people online faster. But it's a far less
       | robust system that falls well short of the vision we once had.
        
         | NetOpWibby wrote:
         | Wow, I had no idea, bookmarking your comment.
        
         | hobofan wrote:
         | > The entity name was the location was the definition.
         | 
         | While that concept sounds cool in theory, in practice it was
         | and is a disaster. In combination with the big degree of
         | centralization and little versioning mechanisms you have to
         | trust the publisher to not alter the semantics, and also hope
         | that they stay online forever or your semantics vanish.
         | 
         | When I first learned about the semantic web, I was very hyped
         | on it, but that quickly subsided once I tried actually querying
         | the ontologies and having to see that most of them yield a 404.
         | 
         | I'm still very hopeful for semantic data (and happy to be able
         | to work on a product leveraging it), but I think for an open
         | semantic web there is a lot of work that needs to go into
         | tooling to make it succeed.
        
         | mftb wrote:
         | I agree with pretty much everything you said, except the part
         | about the "VC-fueled startups". Google and fb were once
         | startups, they were just earlier and Google in particular was
         | smart enough to see the future. As part of a multi-faceted
         | effort (including for instance, Chrome and gmail), they saw the
         | need to head off the Web 3.0 standards, delivering us instead
         | the web we have today. I wish I could have seen things as
         | clearly then.
         | 
         | In the end though I'm not sure it ever would have been any
         | different. People want it "now" and they want it "convenient".
        
         | zozbot234 wrote:
         | There's a standard XML serialization of HTML5 that supports all
         | the features previously associated with XHTML. Additionally,
         | RDF data can be exchanged as JSON via JSON-LD. There's no
         | reason why a typical SPA app could not be built to query RDF-
         | serving endpoints.
         | 
         | "Marking up forum posts" is something that's getting quite a
         | bit of traction nowadays via specifications like
         | ActivityStreams (with its "push" extension ActivityPub now
         | powering the 'Fediverse') and WebMention.
        
       | recursivedoubts wrote:
       | Humans, as of now (and as far as I'm aware, being outside the AI
       | labs at the big tech companies and DARPA) have agency, and so are
       | in a unique position to take advantage of the uniform interface
       | of REST/the web in a flexible manner. I wrote an article about
       | this on the intercooler.js blog, entitled "HATEOAS is for
       | Humans":
       | 
       | https://intercoolerjs.org/2016/05/08/hatoeas-is-for-humans.h...
       | 
       | The idea that metadata can be provided and utilized in a similar
       | manner doesn't strike me as realistic. If it is code consuming
       | the metadata, the flexibility of the uniform interface is wasted.
       | If it is a human consuming the metadata, they want something nice
       | like HTML.
       | 
       | For code, why not just a structured and standardized JSON API?
       | 
       | This appears to be what we have settled on, and I don't see any
       | big advantage extending REST-ful web concepts on top of it. The
       | machines just ignore all that meta-data crap.
        
         | netcan wrote:
         | >> why not just a structured and standardized JSON API?
         | 
         | So in this version of the idea... because structuring data
         | requires work. Unstandardized data exists already. Some of it
         | is already SQLITE. A lot of the rest is in other SQLs, and that
         | might be a smaller bridge.
         | 
         | Author claims (if I'm understanding correctly) that a static
         | website could easily query sqlites over HTTP, and bam, web 3.0.
         | 
         | Honestly, it's hard for me to think/discuss these ideas without
         | examples, even if contrived. What kind of websites would be
         | built this way? What data will they be querying?
         | 
         | A web app that uses photos and address books on the users
         | phone? An alternative UI for news.yc?
        
       | luhn wrote:
       | The author seems to assume that everybody is using SQLite, but
       | SQLite for a production database is an extremely niche choice.
       | Attempting to expose more popular options like PostgreSQL or
       | MySQL as SQLite would be extremely difficult because SQLite only
       | supports a subset of SQL, whereas PostgreSQL and MySQL both
       | implement their unique superset (for the most part) of SQL.
       | 
       | But it doesn't matter. The API doesn't matter. Web 3.0 was never
       | about APIs, it was about _data_. A standardized API is only
       | useful if it outputs standardized data. Having a bunch of bespoke
       | SQLite tables scattered across the web gets us no closer to the
       | ideal of Web 3.0.
        
         | sekao wrote:
         | My point was not that people are using SQLite in prod
         | everywhere; read that paragraph in more of a speculative voice,
         | not a statement of fact about the present. At any rate, i do
         | think the range request technique makes SQLite more practical
         | to use in database-driven apps that normally would've opted for
         | a traditional db like postgres (though there is more work to be
         | done to make this technique fast when doing complex
         | queries...lots of joins are no bueno right now).
        
         | Closi wrote:
         | SQLite is the most used database engine in the world, so I
         | wouldn't call it niche. In fact, by some estimates, it is
         | probably used more than all other database engines combined.
         | 
         | The only difference is that it is usually run locally (compared
         | to Postgres and your other examples), but something doesn't
         | have to run remotely to be considered running in production :)
        
           | luhn wrote:
           | Yes, when I said "production database" I meant a database for
           | a web application. My iPhone running SQLite doesn't relate to
           | Web 3.0.
        
             | Closi wrote:
             | > Yes, when I said "production database" I meant a database
             | for a web application
             | 
             | I'm not sure that's what the author of the article means
             | though, at least in my interpretation.
             | 
             | I assume when they say "everyone is already using it" I
             | assumed that they meant literally everyone is using it on
             | their phones and PCs every day, not that everyone is using
             | it to develop production web applications (because very few
             | people develop production web applications in the grand
             | scheme of things!).
             | 
             | I presume they mean that it is one of (if not the) most
             | common databases in existence in the wild, and it's
             | interesting that it has this property of being able to be
             | remotely read with surprisingly little overhead (without
             | the need to implement an entirely bespoke database to be
             | read in this way).
        
               | luhn wrote:
               | I'm not sure how to parse what the author is saying
               | besides you should be exposing your database directly,
               | which is apparently SQLite.
               | 
               | > In the process, it demonstrated a new kind of web app
               | whose entire database was exposed and queryable from the
               | outside.
               | 
               | > The data needs to be exposed in its original form; any
               | additional translation step will ensure that most people
               | won't bother.
        
               | netcan wrote:
               | I think he does mean web apps/sites, at least in large
               | part. He is talking about implementing of web 3.0, after
               | all. OTOH, I suppose there's no reason why web has to
               | apply primarily to things that are currently webstuff.
               | You both make good points.
        
               | dfabulich wrote:
               | I think you misunderstood the author.
               | 
               | "The data needs to be exposed _in its original form_ ;
               | any additional translation step will ensure that most
               | people won't bother. The beauty of this technique is that
               | you are _already_ using SQLite because it 's such a
               | powerful database; with no additional work, you can throw
               | it on a static file server and others can easily query it
               | over HTTP."
               | 
               | The author believes (IMO wrongly) that there's lots of
               | web app data that can be exposed via SQLite-over-HTTP
               | without translating it into SQLite, because it's already
               | in SQLite.
               | 
               | The author is saying that since lots of web apps use
               | SQLite for their production database, they can easily
               | "throw" their SQLite DB onto the web. But, in that case,
               | you're out of luck if you use Postgres, MySQL, Oracle, MS
               | SQL Server, or any of the popular key-value datastores
               | like Mongo, Redis, or Elasticsearch.
        
             | root_axis wrote:
             | sqlite is actually quite robust in a production web
             | application environment, I have used it as a database in
             | several production applications over the years including
             | one that serviced 600k MAUs without issue. if your
             | application is very write heavy or you're FAANG scale it
             | could present a problem, but IMO sqlite is probably the
             | best bang for your buck solution for the workload of most
             | websites and applications.
        
               | Groxx wrote:
               | tbh I wish it were used more. it's much cheaper to run
               | and just as fast as mysql (sometimes much faster) on your
               | average wordpress blog or equivalent. you don't need to
               | handle _thousands_ of concurrent writes, only maybe 5 max
               | at peak... and queueing them for tens of milliseconds is
               | totally fine. as long as you 're not writing horrifically
               | inefficient insert operations, you absolutely won't
               | notice until you're under ridiculously high load for most
               | sites.
        
               | luhn wrote:
               | Yeah, I've heard the argument before that SQLite is just
               | fine as a production database. Not arguing against that,
               | just saying that it's not a common choice.
        
         | DangitBobby wrote:
         | Well, having to write unique SQL per site is much better than
         | having to write unique scrapers per site.
        
         | contravariant wrote:
         | Wouldn't you want to use a subset of SQL in an API? Why use a
         | unique superset that differs per webpage?
        
       | bokchoi wrote:
       | I never got on the semantic web train, but a translation layer
       | does allow you to make underlying schema changes.
       | 
       | I poked around the ANSIWAVE BBS and it looks fun!
        
       | 0xbadcafebee wrote:
       | Among the 30-odd technologies that make up the Semantic Web[1]
       | (it never died, it's just a collection of tech, lots of
       | organizations use it daily) are graph databases[2]. Graph
       | databases are necessary to implement semantic web databases.
       | 
       | SQLite is not a graph database. Even if you used SQLite to
       | _implement_ a graph database, it would not solve any significant
       | problems of the semantic web, such as access to data, taxonomies,
       | ontologies, lexicons, tagging, user interfaces to semantic data
       | management, etc.
       | 
       | It's a really odd suggestion that you would just copy around a
       | database or leave it on the internet for people to copy from. For
       | the BBS mentioned here, that might actually be _illegal_ , as it
       | might contain PII, and on other sites possibly PHI. Many
       | countries now have laws that require user data to remain in-
       | country. Besides the challenges of just organizing data
       | semantically, there still needs to be work done on data security
       | controls to prevent leaking sensitive information.
       | 
       | The funny thing is, that isn't even hard to do with the semantic
       | web. You classify the data that needs protecting and build
       | functions and queries to match. You can tie that data to a unique
       | ID so that people can "own" their data wherever it goes, and sign
       | it with a user's digital certificate which can also expire.
       | 
       | But all of that (afaik) doesn't exist yet. Everyone is more
       | concerned with blockchains and SQL, either because the fancy new
       | tech is sexier, or the old boring tech doesn't require any work
       | to implement. The Semantic Web never caught on because it's
       | really fucking hard to get right. No companies are investing in
       | making it easier. Maybe in 20 years somebody will get bored
       | enough over a holiday to make a simple website creation tool that
       | implicitly creates semantic web sites that are easy to reason
       | about. It'll probably be a WordPress plugin.
       | 
       | [1] https://en.wikipedia.org/wiki/Semantic_Web [2]
       | https://graphdb.ontotext.com/documentation/enterprise/introd...
        
         | nescioquid wrote:
         | > The Semantic Web never caught on because it's really fucking
         | hard to get right. No companies are investing in making it
         | easier.
         | 
         | I really appreciate this point. I had the opportunity to work
         | on an exploratory project with an experienced ontologist (yes,
         | you really need one of those, I think). The tools were
         | fascinating (reasoners quickly became necessary) but I had the
         | feeling that many of these tools were at a comparatively early
         | stage of maturity.
         | 
         | Trying to explain to people how the system would work was a
         | challenge as it required a primer on theory and application --
         | we glazed many eyes. The CTO wanted to know if we could use
         | blockchain somehow. Another group addressed a slice of the
         | problem with technologies already in use and that decided the
         | matter.
        
         | NetOpWibby wrote:
         | Thanks for the links!
        
         | Karrot_Kream wrote:
         | The semantic web failed to become widely popular because:
         | 
         | 1. Graph databases on top of triple stores are a lot less
         | scalable than relational databases or key-value stores, and
         | this is how semantic data is meant to be stored/queried.
         | 
         | 2. Data is valuable. Handing out data for free in a machine-
         | consumable way is both expensive (machines can request data
         | much more quickly than a human) and a recipe for copycats. The
         | incentives just aren't there.
         | 
         | TBL's Solid project is about trying to separate semantic data
         | providers from the presentation layer and opening up the
         | possibility of payment from these data providers to try to
         | improve the incentives around semantic data sharing.
        
         | zozbot234 wrote:
         | > Graph databases are necessary to implement semantic web
         | databases.
         | 
         | This just isn't true, on multiple levels. RDF is an
         | interoperability standard that does not per se depend on a
         | 'graph-like' data model - you can very much expose plain old
         | relational data via RDF, and this is quite intended.
         | Additionally, modern general-purpose RDBMS's support graph-
         | focused data models quite well, despite being built on
         | 'relational' principles - there's no need for special tech when
         | working with general-purpose graph models, unless you're doing
         | some sort of heavy-duty network analytics.
        
           | 0xbadcafebee wrote:
           | You're talking about extending a database design created 50
           | years ago to work with models and methods that involve
           | significantly different operations and concepts. Let the
           | RDBMS die so we can make something that is much more powerful
           | and requires less fidgeting and squinting to work the way we
           | want.
           | 
           | RDBMS were a niche research project for a decade before they
           | started to catch on in business apps. They've stayed around
           | forever because they're just functional enough to be
           | dangerous. But we've already hit the upper limits of both
           | reliability and performance years ago (remember NoSQL?) and
           | we just keep bolting on features because nobody wants to
           | leave them. The old designs and implementations are holding
           | us back.
        
           | smarx007 wrote:
           | RDF is a labeled multigraph data model with URI-based
           | predicates as edge labels, where each triple represents an
           | edge. You are right that relational data can be exposed in
           | RDF, just like CSV can be loaded into a graph DB.
        
         | sekao wrote:
         | > Graph databases are necessary to implement semantic web
         | databases.
         | 
         | The online docs (and TBL himself) rarely mention of graph
         | databases, but obviously the idea is tied tightly to RDF.
         | Separating it from that implementation detail is part of the
         | point, though. Getting people to represent their data via an
         | additional format was never going to work.
         | 
         | > For the BBS mentioned here, that might actually be illegal,
         | as it might contain PII
         | 
         | Can't imagine the purpose you had in even making this point. In
         | theory, any arbitrary database exposed publicly could be
         | illegal to replicate due to copyright, PII laws, etc. But that
         | has nothing at all to do with a technical discussion of a
         | technique for exposing data. What a bizarre point to make.
         | 
         | As an aside, I'm glad you removed the "Uh........." from the
         | beginning of your post. We're all making an effort to reduce
         | the typical HN snark in the comments, and there's always room
         | for improvement :D
        
       | xmly wrote:
       | I do not understand this conclusion: "Data on the web will only
       | be "semantic" if that is the default, and with this technique it
       | will be."
       | 
       | Why would it be semantic?
        
         | pure_simplicity wrote:
         | They're saying: if it's not strongly incentivized, then it
         | won't happen.
         | 
         | They don't specify in the conclusion what that incentive
         | structure looks like. Saying that it has to be the default is
         | very general.
        
       | firechickenbird wrote:
       | Isn't this Web 1.0 instead? You are only reading data, yeah ok
       | with sql, but you still can't modify it. And also there are
       | already very good standards like Rdf, Owl2, spraql, which are
       | more expressive than sql for consuming the info
        
       | jsight wrote:
       | Is there really that much web safely exposable data in sqlite for
       | this to make sense? I'm not really seeing how this is obviously
       | better than the metadata ideas that preceded it.
        
         | rossdavidh wrote:
         | Some: weather, ratings, topography, dictionaries and
         | encyclopedias, sports scores, market prices, some other stuff.
         | All public knowledge, but not necessarily publicly available
         | (easily) in raw form.
        
           | lostmsu wrote:
           | But doesn't it have to be immutable for the proposal to work?
        
             | vorpalhex wrote:
             | No, as long as the data model is stable you can add new
             | rows.
             | 
             | You might want some kind of versioning for messy column
             | changes, particularly removals.
        
       | fleddr wrote:
       | The semantic web is not a technical problem, it's an incentive
       | problem.
       | 
       | RSS can be considered a primitive separation of data and UI, yet
       | was killed everywhere. When you hand over your data to the world,
       | you lose all control of it. Monetization becomes impossible and
       | you leave the door wide open for any competitor to destroy you.
       | 
       | That pretty much limits the idea to the "common goods" like
       | Wikipedia and perhaps the academic world.
       | 
       | Even something silly as a semantic recipe for cooking is
       | controversial. Somebody built a recipe scraping app and got a
       | massive backlash from food bloggers. Their ad-infested 7000 word
       | lectures intermixed with a recipe is their business model.
       | 
       | Unfortunately, we have very little common good data, that is free
       | from personal or commercial interests. You can think of a million
       | formats and databases but it won't take off without the right
       | incentives.
        
         | onion2k wrote:
         | _Even something silly as a semantic recipe for cooking is
         | controversial. Somebody built a recipe scraping app and got a
         | massive backlash from food bloggers. Their ad-infested 7000
         | word lectures intermixed with a recipe is their business
         | model._
         | 
         | Taking someone else's content and republishing it without
         | permission isn't cool, even if you wrap it in a nice machine
         | readable format.
        
         | Micoloth wrote:
         | I'm more and more seeing that this is true. Still, it is sad.
         | 
         | The question isn't even, what can one do, because obviously
         | nobody can change how incentives works in a given society.
         | 
         | The question is: is there a timeline in which the right
         | incentives (to share data) start being enforced? How would that
         | play out?
        
       ___________________________________________________________________
       (page generated 2022-01-11 23:00 UTC)