hngopher.com

       [HN Gopher] EU Open Web Search project kicked off
       ___________________________________________________________________
        
       EU Open Web Search project kicked off
        
       Author : ZacnyLos
       Score  : 126 points
       Date   : 2022-09-20 17:50 UTC (5 hours ago)
        
 (HTM) web link (openwebsearch.eu)
 (TXT) w3m dump (openwebsearch.eu)
        
       | Animats wrote:
       | Early version: https://www.chatnoir.eu/
        
       | andrewmcwatters wrote:
       | I suspect search engines are an outdated concept for at least the
       | largest of sites, who will generally, but not always, have better
       | ways to directly search their own content.
       | 
       | The remainder of the search problem seems to just be collecting
       | relevant trafficked sites for listing in results. Today Google et
       | al seem to be doing this BY HAND. And it's not even obfuscated.
       | 
       | Recently, for the first time in my life, the wizard behind the
       | curtain seems to have been exposed. I feel strongly that one
       | could probably start a small index that catered to a fairly large
       | audience.
       | 
       | And honestly, for other queries, just tell the user to search
       | that site directly. I think you could even market it to users as
       | not a technical limitation, but behavior that should be
       | considered fuddy-duddy.
       | 
       | Like, really, you're going to search me? You know they have their
       | own search right?
       | 
       | Even Yellow Pages faded into obscurity eventually.
        
         | ur-whale wrote:
         | > the largest of sites, who will generally, but not always,
         | have better ways to directly search their own content.
         | 
         | I have the exact opposite experience.
         | 
         | To wit: searching HN via the algolia link at the bottom is way
         | worse than searching on Google with a site:ycombinator.com
         | restrict.
         | 
         | Same thing for YouTube, where the search engine is tuned for
         | maximizing watch time and strictly not to return what you're
         | looking for.
        
       | notright wrote:
       | the EU loves taxing productive companies and wasting said money
       | in stillborn projects that nevertheless promise a kind of bright
       | socialist federalist Europe in their bureaucratic minds
        
         | Comevius wrote:
         | At least we have some of the most livable countries on Earth to
         | show for it. I take taxes over any trickle-down economics, and
         | don't let me stop you looking up the definition of socialist,
         | because you are using it wrong.
         | 
         | Besides it's a 8.5 million EUR project, it's literally nothing,
         | it's payroll for a few people. The money is being invested into
         | people who then spend most of it, so it's a triple investment.
        
         | arjenpdevries wrote:
         | Isn't it lovely?!
        
           | notright wrote:
           | I am fine as long as they pay for these self-centered utopias
           | with their own money
        
       | hrbf wrote:
       | I've already caught their crawler ignoring robots.txt directives
       | on one of my sites, aggressively indexing explicitly excluded
       | information.
        
         | arjenpdevries wrote:
         | That cannot be true, as the project has yet to start. But
         | anyone can start a crawler, so you may have encountered other
         | people's software. We wouldn't be so unknowledgeable to ignore
         | robots.txt ;-)
        
         | lizardactivist wrote:
         | Out of curiosity, what's the url for your website, and from
         | what IP or host do their crawlers connect?
        
       | logicalmonster wrote:
       | What does "based on European values and jurisdiction" refer to?
       | I'd love to be pleasantly surprise, but this sounds like it's
       | ripe for centralized censorship.
        
         | notright wrote:
        
         | InTheArena wrote:
         | Given the history of the 20th century, this kind of comment
         | promoting European values and jurisdiction seems..... dicey.
         | Companies ethical records, as shitty as they are have nothing
         | on the mass destruction, genocide and stupidity of governments.
        
       | ur-whale wrote:
       | Looks like it's Northern EU only.
       | 
       | No research institutes from {France, Italy, Spain, Greece,
       | Portugal, etc ...} involved.
        
         | arjenpdevries wrote:
         | Slovenia, Czech Republic. But yes, I think there was a
         | competing proposal from Italy/Spain. Not enough budget for two
         | projects in this area, unfortunately, as they were good too.
        
       | marginalia_nu wrote:
       | I'm a bit skeptical EU-funding a bunch of professors is the way a
       | search engine will be built.
       | 
       | The primary goal for academics is to publish new findings, while
       | what you need to build a search engine is rock solid CS and
       | information retrieval basics. Academically, it's not very
       | exciting. Most of it was hashed out in the 1980s or earlier.
        
         | hkt wrote:
         | ..correct me if I'm wrong, but Google was started by a couple
         | of postdoctoral researchers, no?
        
           | DannyBee wrote:
           | Who deliberately did not stay in academia to do it. More to
           | the point, a successful team building a product like a search
           | engine requires roles that academia doesn't really have.
           | 
           | Who is doing product management?
           | 
           | Who is doing product marketing?
           | 
           | etc
           | 
           | This is all applied engineering at this point, not R&D. How
           | does it at all fit into academia's strong suit?
        
         | mkl95 wrote:
         | > 14 European research and computing centers
         | 
         | > 7 countries.
         | 
         | > 25+ people.
         | 
         | There are literally dozens of them!
         | 
         | https://openwebsearch.eu/partners/
        
           | marginalia_nu wrote:
           | I don't think the number of people or even the size of the
           | budget is wrong. A small team can be incredibly powerful and
           | productive if you have the right people. In fact, I think far
           | more often search engines fail from trying to start too big
           | than too small.
           | 
           | The problem is that you need people who actually know how to
           | architect complex software systems much more than you need
           | revolutionary new algorithms. For that, professors are the
           | wrong people. A professor on the team, sure, that might be
           | helpful. Not half a Manhattan project's worth.
        
             | mkl95 wrote:
             | It happens all the time in Europe. Collaboration between
             | public and private companies is pretty much a pipe dream in
             | the EU. Some company that actually works on building search
             | technology would achieve way more than a bunch of
             | professors.
             | 
             | I disagree on the budget though. It is basically pocket
             | change.
        
               | marginalia_nu wrote:
               | Arguably the biggest most unsolved problem in search is
               | how to make a profit (or even break even). This can be
               | approached in two ways: You can either try to find some
               | way of making search more profitable, or you can find a
               | way to make search cheaper. I think the latter is a lot
               | more plausible than the former.
               | 
               | A shoestring budget keeps the costs down by design and by
               | necessity. A large budget virtually ensures the search
               | engine becomes so expensive to operate it will never
               | break even.
        
         | [deleted]
        
         | jjulius wrote:
         | >I'm a bit skeptical EU-funding a bunch of professors is the
         | way a search engine will be built.
         | 
         | Heh, so, funny story...
         | 
         | >A second grant--the DARPA-NSF grant most closely associated
         | with Google's origin--was part of a coordinated effort to build
         | a massive digital library using the internet as its backbone.
         | Both grants funded research by two graduate students who were
         | making rapid advances in web-page ranking, as well as tracking
         | (and making sense of) user queries: future Google cofounders
         | Sergey Brin and Larry Page.
         | 
         | >The research by Brin and Page under these grants became the
         | heart of Google: people using search functions to find
         | precisely what they wanted inside a very large data set.
         | 
         | https://qz.com/1145669/googles-true-origin-partly-lies-in-ci...
        
       | imhoguy wrote:
       | "unbiased...based on European values" - will it fly?
        
         | topspin wrote:
         | European values are inherently unbiased. What's the problem?
         | o.O
        
       | tricky777 wrote:
       | seems like a very interesting idea. So many times I wanted some
       | kind of advanced gogle-query-language. (i know about allinurl and
       | such, but thats not enough. google is tuned for average user,
       | which is good for google, but not for any non average query)
        
       | dataking wrote:
       | I don't see any mention of Quaero, the EU search engine that was
       | supposed to compete with Google [0, 1]. How is this time
       | different?
       | 
       | [0] https://en.wikipedia.org/wiki/Quaero
       | 
       | [1] https://www.dw.com/en/germany-pulls-away-from-quaero-
       | search-...
        
         | arjenpdevries wrote:
         | For starters: the objective is to create the index not the
         | engine, that's quite a different ambition.
         | 
         | We are very aware of the Quaero/Theseus history :-)
        
           | marginalia_nu wrote:
           | What is the difference?
        
             | freediver wrote:
             | Supposedely the project is about just building the
             | platform/infrastructure (which is what the index is) upon
             | which search engines can be built.
             | 
             | These search engines will then have the freedom to define
             | their own search product experience, business model, even
             | ranking of results.
        
               | jonas21 wrote:
               | So something even more vaguely defined and detached from
               | real use cases than last time? Great.
        
               | freediver wrote:
               | The above actually defines the scope very well. There is
               | lot more to be built upon it, but it is not what the
               | project is trying to solve.
        
           | notright wrote:
        
         | notright wrote:
         | This was the past legislature project. The new legislature
         | brings CHANGE. They are not the same..
        
       | thepangolino wrote:
        
       | dang wrote:
       | Url changed from https://www.zylstra.org/blog/2022/09/eu-open-
       | web-search-proj..., which points to
       | https://djoerdhiemstra.com/2022/open-web-search-project-kick...,
       | which points to this.
        
         | lucideer wrote:
         | Which now shows:
         | 
         | > _Resource Limit Is Reached_
         | 
         | > _The website is temporarily unable to service your request as
         | it exceeded resource limit. Please try again later._
         | 
         | Original URL might be more resilient...
        
           | dang wrote:
           | Hmm. I can access the page without that message. In any case
           | the Internet Archive seems to have it:
           | 
           | https://web.archive.org/web/20220920183027/https://openwebse.
           | ..
        
       | Proven wrote:
        
       | rrwo wrote:
       | It will be interesting to see what the index contains, and how it
       | is structured.
       | 
       | What made Google such a game changer was that they based their
       | index not just on the contents, but on how pages linked to each
       | other.
        
         | arjenpdevries wrote:
         | That's the marketing story. I think it's because they didn't
         | clutter their homepage like AltaVista did.
        
       | boyter wrote:
       | I have written this before but I'll put it here again. What I
       | would like to see is a federated search engine. Based on
       | activitypub that works like mastodon. Don't like the results from
       | one source? Just remove them from your sources, or lower their
       | ranking. Similar to yacy but you can work with the protocol to
       | connect or build whatever type of index you want using whatever
       | technology you like, and communicate over an existing standard.
       | Want to build the worlds best index of Pokemon sites, then go do
       | it. Want to build a search engine using idris or ats? Sure! I did
       | note the professors are on mastodon so perhaps this may actually
       | happen.
       | 
       | One of these days I'll actually implement the above assuming
       | nobody else does. I figured if I can at least get the basics done
       | and a reference implementation that's easy to run it could prove
       | the concept. If anyone is interested in this do email my in my
       | bio.
       | 
       | What I worry about for this project is that it becomes another
       | island which prohibits remixing of results like google and bing,
       | and its own index and ranking algorithms become gamed.
       | 
       | I wish the creators best of luck though. I am also hoping for
       | some more blogs and papers about the internals of he engine. So
       | little information is published in the space that anything is
       | welcome, especially if it's deeply technical.
        
         | fabrice_d wrote:
         | At least one of the partners
         | (https://openwebsearch.eu/partners/radboud-university/) does
         | research on "federated search systems", so there's hope!
        
         | asim wrote:
         | One of the things I wonder here is if it would be easier to
         | just start by crawling known RSS feeds and then exposing a JSON
         | API for the data and making the whole thing open source. Then
         | keeping a public list of indexes and who crawls what.
         | Eventually moving into crawling other sources but first
         | primarily addressing the majority of useful content that's
         | easily parseable.
        
         | TacticalCoder wrote:
         | > Don't like the results from one source? Just remove them from
         | your sources, or lower their ranking.
         | 
         | That's basically Usenet killfiles and, yes, I think they're
         | totally due for a comeback in one form or another. Usenet may
         | have had its issues towards the end (although it still exists),
         | but killfiles weren't one of its problems. The simplest one you
         | could just discard sources you didn't want to read anymore but
         | the more advanced you could assign weight/rankings based on
         | various factors (keywords / usernames / if you did participate
         | or not in a discussion / etc.).
        
         | arjenpdevries wrote:
         | We like Federated search, we like decentralized search, and
         | even P2P search; we are trying to find a good mix, and decided
         | to get started rather than wait! Exciting times.
        
           | marginalia_nu wrote:
           | What are the benefits from this?
           | 
           | I'm not trying to be dismissive, it's just my feeling from
           | working on search.marginalia.nu is that nearly every aspect
           | of search benefits from locality, not only is the full crawl-
           | set instrumental in determining both domain rankings and
           | relevance signals on a term-level such as anchor tag
           | keywords; but the way an inverted index is typically set up
           | is extremely disk cache friendly where the access pattern for
           | checking the first document warms up the cache for the other
           | queries, but that discount obviously only exists when it's
           | the same cache.
        
           | hkt wrote:
           | I would _love_ to be able to run a node that mirrors part or
           | all of an index like this, and to let people query it - a bit
           | like https://torrents-csv.ml/#/
           | 
           | Good luck! I'll be watching your progress and cheering you
           | all on!
        
         | cookiengineer wrote:
         | Isn't searx what you're describing? I was running an instance
         | for a while, and it's basically a meta search engine that has
         | support for all kinds of providers.
         | 
         | There are also some web extensions available so that you can
         | fill it with more data.
         | 
         | [1] https://searx.github.io/searx/
        
           | boyter wrote:
           | Searx is half of it where it calls out to other searches but
           | does not provide its own index as far as I can see. It also
           | does not remix the results.
        
           | vindarel wrote:
           | I'd say it rather looks like Seeks, unfortunately defunkt:
           | https://en.wikipedia.org/wiki/Seeks
           | 
           | > a decentralized p2p websearch and collaborative tool.
           | 
           | > It relies on a distributed collaborative filter[6] to let
           | users personalize and share their preferred results on a
           | search.
        
         | googlryas wrote:
         | What benefit does federation bring here? Unless it is very
         | simple to set up, most communities are non-technical and
         | probably won't be able to set up their own crawler. I would
         | think just a search engine that lets you customize the ranking
         | algorithm, and maybe hook into whatever ontology they've
         | developed and ranking it accordingly would be sufficient.
        
         | melony wrote:
         | What's the point of a federated search engine? At the end of
         | the day most nodes will end up implementing the same
         | regulations/censorship with development driven primarily by a
         | few. It's like ethereum vs ethereum classic all over again. If
         | the EU or the developers' respective governments demand a
         | censorship or forgetting feature to be implemented, it's not
         | like the federated nature would matter. An open source search
         | index is useful, a search engine that can be easily self hosted
         | is also useful. But building a search engine as a federated
         | system is a gimmick with no significant value.
         | 
         | Do you see any major Mastodon nodes interfacing with Truth
         | Social or Gab? I certainly don't. If federation barely works
         | for a social media app, I fail to see how it would even matter
         | for a search engine.
        
       | ur-whale wrote:
       | Search is way more than just indexing.
       | 
       | I'd really like to see them match the 20+ years of search quality
       | fine-tuning that Google built into their search engine.
       | 
       | Not that Google is as good as it used to, but still, catching up
       | with them is way more complicated than just building a big crawl
       | + index piece of infrastructure.
       | 
       | And all of that on a government-funded shoestring budget.
       | 
       | Mmmh.
       | 
       | Good luck to them, but I'm not holding my breath.
        
       | bslqn wrote:
        
       | merb wrote:
       | so it began, that sern starts to gather market share.
       | 
       | --
       | 
       | I doubt this will take off. I mean they investend more in funding
       | and marketing instead of starting to built something. they
       | should've started with code (agpl3 of course) and invited more
       | and more people. at the moment this is more buzzword bingo
       | bullshit than anything else. it's basically always the same
       | problem, instead of focusing on the product, they fous more on
       | the message.
        
       | s-xyz wrote:
       | Correct me if I am wrong, but so the purpose is to create an
       | index database, upon which custom search engines can be attached
       | upon? Ie, the EU will crawl all pages on the web?
        
         | murphyslab wrote:
         | The index is just the first step according to news articles:
         | 
         | > Once the index has been created, the next step is to develop
         | search applications.
         | 
         | > The team at TU Graz will be particularly active here in the
         | CoDiS Lab and will work on the conception and user-centric
         | aspects of the search applications. This includes, for example,
         | research into new search paradigms that enable searchers to
         | have a say in how the search takes place. The idea is that
         | there are different search algorithms or that you can influence
         | the behavior of the search algorithms. For example, you could
         | search specifically for scientific documents or for documents
         | with arguments, include search terms that have already been
         | used, or include documents from the intranet in the search.
         | 
         | https://www.krone.at/2791083
        
       | rgrieselhuber wrote:
       | The real game-changer in search would be if companies would agree
       | to publish indexes of their own sites in an open standard to a
       | place that everyone could access. This would undercut the
       | monopoly power that large search engines have and allow everyone
       | to focus on innovating the best way to search that content vs.
       | having to spend so much time and money to crawl and index it.
        
         | _Algernon_ wrote:
         | People would abuse that for SEO purposes within seconds.
        
           | rgrieselhuber wrote:
           | The market need would then be shifted to the best search
           | interfaces instead of who has the most money to build the
           | biggest index. A much better focus, IMO.
        
         | TheFerridge wrote:
         | I believe that is precisely what the project is aiming to do,
         | and to turn it into a public resource.
        
         | arjenpdevries wrote:
         | We will explore that idea in the project, I also think it may
         | help (but vulnerable for Web index spam by adversary parties).
        
           | rgrieselhuber wrote:
           | That is indeed the biggest problem but maybe something that
           | can be more effectively dealt with downstream by the content
           | rankers and potentially even the user base / custom search
           | algorithm builders. Brave's Goggles project is a good early
           | prototype of this concept.
        
         | freediver wrote:
         | Standard for this already exists [1] but it does not solve the
         | problems of
         | 
         | 1. Implementation (sites do not need to have a sitemap; or
         | those that have it, may not have an accurate one)
         | 
         | 2. Discoverability (finding sites in the first place, you'll
         | need a centralised directory of all sites; or resort back to
         | crawling in which case sitemaps are not needed)
         | 
         | 3. Ranking (biggest problem in creating a search engine)
         | 
         | [1] https://www.sitemaps.org/protocol.html
        
           | rgrieselhuber wrote:
           | The sitemaps standard (if this is the basis) would need to be
           | expanded to support additional metadata / structured data to
           | support this idea.
           | 
           | 1. This would be up to sites, to your point, major question
           | would be best way to create incentives.
           | 
           | 2. This is solvable via a number of approaches, but the
           | search engines themselves would be mostly responsible for
           | finding the right approach for their business. I know how I
           | would do it.
           | 
           | 3. Indeed, which would be the main point of this
           | decentralization, to let search engines focus on their
           | hardest problem.
           | 
           | Edit: would Kagi not benefit from having to worry about
           | crawling / indexing sites?
        
             | [deleted]
        
             | freediver wrote:
             | > would Kagi not benefit from having to worry about
             | crawling / indexing sites?
             | 
             | It would, but sitemaps do not provide that function as we
             | discussed above. However if EU Open Web Search succeeded,
             | that is something we could probably use to some extent.
        
         | wizofaus wrote:
         | I suspect you underestimate how much of the power of search
         | engines is being able to interpret search queries and figure
         | out what a user is really looking for. Even if there were a
         | public, standardised up-to-date high performance full-text
         | index of the entire web freely available I'm willing to bet
         | Google search would be a useful value-add in its ability to
         | answer natural language queries.
        
           | rgrieselhuber wrote:
           | I run an SEO platform SaaS, so I'm familiar. :)
        
         | spookthesunset wrote:
         | I'm pretty sure we tried that way back in the day with <meta
         | name="keywords" content="spam spam spam spam">. People would
         | stuff that with every word in the english language. Older
         | search engines that used those keywords returned some pretty
         | awful results. You simply can't trust sites, who have a strong
         | incentive to get to the top of SEO rankings, to not lie. In
         | fact, given at least one of your competitors will stuff their
         | keywords to get to the top you'll have to do it too. It would
         | become an arms race for who can stuff the most garbage into
         | their indexes to "win". It just doesn't work.
         | 
         | All search engines that attempt to be useful will have to
         | filter out the junk. You just have to trust that the search
         | engine you are using isn't withholding results from you that it
         | considers "bad" (eg: "misinformation" (i.e. stuff somebody
         | disagrees with)).
         | 
         | And to me, that is the crux of the debate really. Nobody wants
         | spam for search results--everybody agrees with that and there
         | is no real debate about filtering that crap out. The argument
         | really is should a very large company that has a huge market
         | share get to decide what constitutes "fact" and what is
         | "misinformation". Based on 2.5 years of experience so far, what
         | was once deemed "misinformation" has a sneaky way of becoming
         | "factual information". Labeling and hiding "misinformation"
         | because it goes against some narrative pushed by incredibly
         | powerful entities is very scary and there was a hell of a lot
         | of exactly that going on during this covid crap.
         | 
         | I used to fall on the side of "private companies can do
         | whatever they want" but now I'm not so sure. Companies like FB,
         | Twitter or Google play a huge role in shaping politics and
         | society. I'm no longer convinced it is okay to let them play
         | the role of "fact checker" or anything like that. Filtering
         | spam is one thing, but hiding "misinformation" is entirely
         | different.
        
           | rgrieselhuber wrote:
           | Your last point is also the one (aside from the economics) I
           | am the most interested in.
           | 
           | I think we live in a world now where we are so used to a few
           | tech giants mediating everything for us that we can't even
           | imagine other solutions to this problem, but it's also how we
           | got to this point in the first place.
        
           | closedloop129 wrote:
           | >You simply can't trust sites, who have a strong incentive to
           | get to the top of SEO rankings
           | 
           | Why is it not enough to punish sites that abuse the keywords?
        
             | spookthesunset wrote:
             | Who is the one who punishes the abusers? How can you scale
             | the solution to deal with billions of pages?
        
         | bobajeff wrote:
         | One problem with that is now you have to trust the websites to
         | give an accurate index of their content.
        
           | jeffbee wrote:
           | Anyone who thinks this will work has never tried to index a
           | site. A huge amount of effort is spent trying to figure out
           | if the site is serving different content to users vs
           | crawlers, or if the site is coded to appear visually
           | different to humans vs machines. If you ask sites to index
           | themselves you will get lies only.
        
             | rgrieselhuber wrote:
             | I index sites all the time and I think it could work. There
             | will be other problems, of course, but we already are
             | partly there with XML sitemaps. Relying on the large search
             | engines to enforce "honesty" from websites puts them into a
             | mediator role that has a number of negative effects both
             | for search in general and, increasingly, society at large.
        
               | kittiepryde wrote:
               | Relying on sites to be honest about themselves, is even
               | less likely. There are monetary incentives for many of
               | them not to do that. Many sites host dishonest and
               | clickbait content with extreme levels of SEO already. The
               | cost of dishonesty decreases if you can directly modify
               | the index.
        
               | rgrieselhuber wrote:
               | I think that is primarily a symptom of the fact that we
               | have a bottleneck on search interface providers. If it
               | were easier / cheaper for new search engines / rankers to
               | exist in the market, they could fairly easily filter out
               | unscrupulous domains.
        
               | wumpus wrote:
               | I've run a web-scale search engine and I don't think it
               | will work.
        
           | rgrieselhuber wrote:
           | Indeed
        
         | boyter wrote:
         | I'd rather see them publish a federated search of their own
         | content.
        
           | rgrieselhuber wrote:
           | Your comment prompted me to check out Searchcode, looks very
           | interesting. How would the federated search model work in
           | this example? Instead of you having to index the various code
           | repositories, they would index themselves and make their
           | search of those indexes available via a federated API?
        
         | rrwo wrote:
         | There are already sitemaps, and pages used structured data like
         | HTML5/ARIA roles, RDF or JSON+LD to provide some semantic
         | annotations.
         | 
         | I'd rather that web robots use this information to build useful
         | indexes than to have to worry about generating yet another feed
         | in the hopes that it helps people find my content in a search
         | engine.
         | 
         | Besides, a web robot can determine how much other sites link to
         | my content and help determine its overall ranking in results.
         | Adding another type of index file to my site will do nothing to
         | determine how it relates to other sites.
        
           | rgrieselhuber wrote:
           | The structured data on sites, unfortunately, still requires a
           | crawler to index that content, which serves as a barrier for
           | search engine startups. At a minimum, adding some metadata
           | content to XML sitemaps would go a long way to solving some
           | of this problem (title, meta description, content summary,
           | even structured data to the sitemaps).
        
             | Eduard wrote:
             | What's the problem of using any of the many free webcrawler
             | (libraries) available to crawl a website (even if solely
             | based on the pages advertised by sitemap.xml / robots.txt-
             | announced sitemaps), then extract structured data from
             | these pages?
             | 
             | I don't see this as a barrier unique to startups.
        
               | rgrieselhuber wrote:
               | It's easy to do for small sets of sites, but try doing
               | this at web-scale and you quickly run into a large
               | financial barrier. It's not about technical feasibility
               | as much as it is cost.
        
       | DOsinga wrote:
       | With a budget of 8.5M Eur/Usd. Alphabet spends 200B per year. If
       | 40% of that is spend on search, their budget is 10 thousand times
       | larger.
        
         | lucideer wrote:
         | It's definitely a comparative underdog regardless, but if you
         | think Alphabet spends anywhere near 40% on search you're out of
         | your mind. I'd be shocked if their spend is double-digits. I'd
         | be unsurprised if it's <1%.
        
         | o_m wrote:
         | I doubt 40% is spent on search. Seeing how bad Google has
         | gotten, it seems more likely there is just a skeleton crew
         | keeping the lights on
        
         | mkl95 wrote:
         | I would be shocked if Alphabet spent >5% on search. But even 1%
         | would dwarf this project.
        
       | antics9 wrote:
       | We need to develop a social aspect to search where results are
       | also moderated and curated by humans in some kind of way.
        
         | topspin wrote:
         | And when that curation produces results you find abhorrent?
         | What then? Because I guarantee it would; a metaphysical
         | certitude.
        
       | Extropy_ wrote:
       | On first glance, I see the word "unbiased" immediately followed
       | by "based on European values". Now, I'm no expert, but to me,
       | that seems pretty biased.
        
         | radiojasper wrote:
         | biased on European values
        
       | nathan_phoenix wrote:
       | This is just a short reply to a blog which mentions that the
       | project started...
       | 
       | The actual website of the project (with some concrete info) can
       | be found here: https://openwebsearch.eu/
        
         | dang wrote:
         | Changed now. Thanks!
        
       | [deleted]
        
       | jacooper wrote:
       | > A new EU project OpenWebSearch.eu ... [in which] ... the key
       | idea is to separate index construction from the search engines
       | themselves, where the most expensive step to create index shards
       | can be carried out on large clusters while the search engine
       | itself can be operated locally. ...[including] an Open-Web-Search
       | Engine Hub, [where anyone can] share their specifications of
       | search engines and pre-computed, regularly updated search
       | indices. ... that would enable a new future of human-centric
       | search without privacy concerns.
       | 
       | So.. Who's going to create the index? Indexing the web is
       | expensive, and its offset by the ads the indexer runs on their
       | search website, such as Google, bing, brave and others.
        
         | amelius wrote:
         | I wonder how privacy will be ensured when your query hits the
         | map-reduce infrastructure running on these clusters.
         | 
         | Regarding privacy the bar is significantly higher than what
         | Google has to deal with. This will come at some cost in quality
         | and/or speed.
        
         | caust1c wrote:
         | Every individual website has an incentive to create indices of
         | their own content, and hosting providers could provide it as a
         | service. Not hard to envision. Search Engines could download
         | these indices periodically to build the meta-search.
        
           | wizofaus wrote:
           | Also not hard to envision websites being incentivised to lie
           | in their indexes.
        
         | moffkalast wrote:
         | Someone who's snagging an EU grant, that's who.
        
           | ur-whale wrote:
           | > Someone who's snagging an EU grant, that's who.
           | 
           | Bullseye.
        
       | beardedman wrote:
       | Oh cool, but do you mean the "EU Open Web Search Data Collection
       | Program"?
        
       ___________________________________________________________________
       (page generated 2022-09-20 23:00 UTC)