[HN Gopher] Show HN: Search Engine for Blogs
       ___________________________________________________________________
        
       Show HN: Search Engine for Blogs
        
       Hey HN,  Blog discovery is a problem [0] due to the decentralized
       nature of online writing. Everyone writes on their own site or
       platform, and there's no central place that brings everything
       together. Google results prioritize large media publications over
       blogs, so we need something else.  Blog Surf is an attempt to
       organize all of the great online writing done by individuals. I
       launched this project last year as a directory of personal blogs
       [1], but have now rebuilt it from scratch into a full-text search
       engine for blog posts.  You can search for blog posts, and filter
       by publish date and reading time. Blogs are manually reviewed
       before being added.  Posts are sorted by MarketRank [2], which is a
       measure of popularity across various online communities. Most
       projects that have attempted to organize blogs lack any way to
       measure the quality of a post, reducing their utility. With
       MarketRank, you can expect the top results for any query to be
       something you'd want to read.  The mental model for searching Blog
       Surf is "I want to see the best essays on X"  There's also a
       directory so you can browse blogs by category, if you want a
       throwback to the Yahoo days.  If you're a blogger yourself, you can
       check out the rankings page to see how your blog compares to
       others.  If you want to play around with things, we have a search
       API, and the full post dataset is also available for download.  [0]
       https://news.ycombinator.com/item?id=28591880  [1]
       https://news.ycombinator.com/item?id=26506126  [2]
       https://dkb.io/post/market-rank
        
       Author : dbrereton
       Score  : 165 points
       Date   : 2022-03-29 15:50 UTC (7 hours ago)
        
 (HTM) web link (blogsurf.io)
 (TXT) w3m dump (blogsurf.io)
        
       | diogenesjunior wrote:
       | this is very cool, thank you
        
       | pete_nic wrote:
       | This is great. As a frequent Google user, I breathed a sigh of
       | relief seeing ad free search results from individuals.
       | 
       | Curious - how do you know whether a site is a blog versus
       | something else?
        
         | stanislavb wrote:
         | He's sheared in the blog post - manual review of all blogs
        
       | drBonkers wrote:
       | Many search-engine posts recently. When will someone make the
       | Search Engine for Search Engines?
        
         | dbrereton wrote:
         | It's called a metasearch engine [0]. There was a project
         | launched 2 years ago called Runnaroo [1] that kind of did this,
         | but they aren't online anymore.
         | 
         | I think most of us in this space are willing to collaborate so
         | something interesting could happen.
         | 
         | [0] https://en.wikipedia.org/wiki/Metasearch_engine
         | 
         | [1] https://news.ycombinator.com/item?id=23771131
        
         | marginalia_nu wrote:
         | I do think some form of collaboration between small search
         | engines would be very beneficial. I've been thinking about how
         | to make that happen. So far I've added a public API to my
         | search engine, and published some data.
         | 
         | Not sure what is a good way of creating a space for
         | collaboration...
        
       | ChrisArchitect wrote:
       | What year is it?
       | 
       | Don't hate blogs and happy for resurgence, but repeating an
       | uphill battle with indexing like it's 2007.
       | 
       | Also, random interesting posts on front page are like from 2009,
       | 2011, 2015...... What? That's the freshest more relevant content?
        
         | dbrereton wrote:
         | Right now the "random interesting posts" are a random selection
         | from the top 1000 of all time.
         | 
         | However, if you want fresher content, you can use the date
         | range selector and set it to "Past Week" for the best posts of
         | the week.
        
         | kingryan wrote:
         | yeah, reminds me of working at technorati in 2005
        
         | Minor49er wrote:
         | > Also, random interesting posts on front page are like from
         | 2009, 2011, 2015...... What? That's the freshest more relevant
         | content?
         | 
         | Why would you expect a section titled "Random Interesting
         | Posts" to have the freshest more relevant(?) content?
        
         | marginalia_nu wrote:
         | Dunno, I typically find fresh and interesting mutually
         | exclusive.
        
       | superasn wrote:
       | This is definitely very cool as I've been looking for something
       | like this since technorati (which was originally a blog search
       | engine).
       | 
       | Would love to hear details about how you created the database,
       | the infrastructure, etc if it's not a trade secret. Kudos on the
       | launch!
        
         | dbrereton wrote:
         | > This is definitely very cool as I've been looking for
         | something like this since technorati (which was originally a
         | blog search engine).
         | 
         | Technorati was one of the inspirations here so that's great to
         | hear.
         | 
         | > Would love to hear details about how you created the
         | database, the infrastructure, etc if it's not a trade secret.
         | Kudos on the launch!
         | 
         | Sure, it's actually fairly simple! The search backend itself is
         | running on Typesense [0], which was very quick and easy to
         | setup.
         | 
         | Due to the way ranking is calculated, I can actually avoid
         | doing any real web crawling (though, I may add that in soon to
         | help increase the index size). Ranking is based on submission
         | to online communities, so all I really need is those
         | submissions.
         | 
         | Using the Reddit, HN and Twitter APIs, I search for any
         | submissions related to any blogs in the database, then those
         | submissions give me the post URLs.
         | 
         | Once I have the post URLs, I just need to request those
         | specific URLs to get the post data.
         | 
         | Then there's scripts for things like content extraction,
         | inflation calculation, currency conversion etc.
         | 
         | All of those scripts are in python.
         | 
         | The frontend is a simple React app built with Next. All pages
         | are statically generated.
         | 
         | Let me know if there's any more questions!
         | 
         | [0] https://typesense.org/
        
           | qchris wrote:
           | Any plans on open-sourcing the code? I'm not sure if your
           | intention is to build a business using it (or, if you were,
           | using AGPLv3 might help prevent third-parties from unfairly
           | competing with you), but I'm sure a number of people would be
           | interested in trying to run this on their own hardware,
           | building their own personal index, hacking on it to add
           | features they find interesting for themselves, or otherwise
           | just learning something by taking a look under the hood (I'm
           | probably in this category myself).
        
             | dbrereton wrote:
             | This is not a business, and I would like to open source it,
             | but it would probably be better for everyone if I wait
             | until I clean up my garbage code, which will take some
             | time.
        
       | pomokhtari wrote:
       | Google has been giving me a very hard time for a while. It's time
       | for SEO to die. We need stuff like this.
        
       | NAR8789 wrote:
       | I dig this! Since you're full-text indexing blogs with an eye
       | towards content discovery, can I tempt you towards building out a
       | reverse link index function to enable me to browse by thread?
       | 
       | My use case: often times blogs will respond to other blogs,
       | linking to the original post in the process. The nature of
       | linking means it's very easy to follow threads backwards in time,
       | but given the original post it's often hard to discover the
       | responses and ongoing conversation to follow things downstream.
       | I'd like that downstream browsing to be easier.
       | 
       | My hope would be that such a tool could unlock higher-quality
       | discourse. As a reader, this would let me hijack my natural
       | tendency to follow comment threads, and redirect that attention
       | towards slower-paced, more nuanced, more focused writing.
       | 
       | Edit: hmmm... though looking further, maybe this goes against
       | your MarketRank philosophy.
        
         | dbrereton wrote:
         | I totally agree with you on this. Being able to follow the
         | links easily in either direction would make it easier to fall
         | into interesting rabbit holes.
         | 
         | > Edit: hmmm... though looking further, maybe this goes against
         | your MarketRank philosophy.
         | 
         | It doesn't at all, but curious as to why you'd think that.
         | We're not talking about using backlinks to rank pages after
         | all, just as a discovery tool, which I think is great.
        
       | mr90210 wrote:
       | Great idea mate. Keep up the good work.
        
       | valdect wrote:
       | That's an awesome implementation and works really neat. I have
       | been thinking to add this capability to https://refined.blog/ .
       | also if you need tagged blog sites you can use our bloglist. also
       | i previously posted in hn so there are some good blogs in here (
       | https://news.ycombinator.com/item?id=27973836 )
        
         | dbrereton wrote:
         | Thanks! And that's awesome, I could definitely use more blogs,
         | will check it out.
        
       | recuter wrote:
       | Would you mind sharing some stats on the index? Are you
       | populating it with manual curation?
        
         | dgivney wrote:
         | There are currently 900 blogs in the index. Every blog is
         | manually reviewed, so this number will grow slowly and steadily
         | over time, until I maybe automate things.
         | 
         | https://blogsurf.io/about
        
       | u2077 wrote:
       | Cool Idea, I love search engines with content made by real
       | people. I'm not sure how many of these you have, but you might be
       | able to pull some more blogs from
       | https://bloggingfordevs.com/trends/ or https://blogdb.org/blogs
        
         | dbrereton wrote:
         | Some good sources, will check them out!
        
       | shortformblog wrote:
       | Dmitri showed this to me a couple of weeks ago, and I was super-
       | impressed, enough so that even though he sent me a note about it
       | at the end of the night, I stayed up to respond to him. This
       | makes me feel like the spirit of Technorati has a chance of
       | making a comeback someday.
        
       | Minor49er wrote:
       | The tags are pretty limited and have some duplicate entries.
       | Submissions also require a tag for them to be posted, so if they
       | don't fit what's already there, a wrong one will have to be
       | supplied.
        
         | dbrereton wrote:
         | Yeah tagging is definitely a weak point as the current focus is
         | on search. I just removed the tag requirement.
         | 
         | If you're submitting a blog and we don't already have a tag
         | that fits, you can add it to the "Notes" section.
        
       | applgo443 wrote:
       | Interesting.
       | 
       | How do you figure out which are blogs and which arent'?
        
       | codazoda wrote:
       | I love the idea, but I couldn't find any results that didn't look
       | mostly random. I thought I'd look for posts about shooting or
       | editing video. Couldn't find anything close, no matter how I
       | ordered my query.
        
         | dbrereton wrote:
         | Ah yeah, there's probably not any blogs that talk about video
         | editing. The index is fairly small right now and does better
         | for tech/business/politics queries at the moment. Will work on
         | increasing the index.
        
       | stringlytyped wrote:
       | You might want consider using OpenSearch [1] to make it easier to
       | add Blog Surf to browsers as a search engine that can be accessed
       | from the location bar. I added it manually in Firefox but it
       | would have been handy to just be able to right-click the search
       | field and choose "Add a Keyword for this Search".
       | 
       | [1] https://developer.mozilla.org/en-US/docs/Web/OpenSearch
        
       | rmason wrote:
       | Potentially quite useful. But I ran into one snag. I searched on
       | my friend and frequent blogger Ben Nadel. But at the top all the
       | posts were about Angular.
       | 
       | What I wanted were all his posts that weren't about Angular. So I
       | tried adding -angular which works in Google. It pulled up one
       | non-angular post and all the rest were the original ones that are
       | there when you load the page. Add that one feature and I will
       | probably use it a lot.
        
         | dbrereton wrote:
         | Currently don't support any query operators, but yeah it would
         | be very useful. Will add that functionality soon.
        
       | rambambram wrote:
       | Really like it, good job! Nice color scheme, font choice, and
       | elegant layout.
       | 
       | One little thing though: changing a search phrase or word and
       | doing a new search, I notice the results do change, but there's
       | no way to know if it really happened. Changing a Google search,
       | the whole page flashes empty, that way I see/sense there's
       | something new. In your case, a change is subtle, very subtle, too
       | subtle. In one instance I had to look carefully to see the change
       | in results.
       | 
       | Maybe adding a "you searched for X" is good enough, but I guess
       | you can come up with a better way.
        
       | akselmo wrote:
       | I love the flaming comic sans Directory header, lol
        
       | SecurityLagoon wrote:
       | I love this. I am always on the lookout for material written by
       | individuals; but, it's surprisingly hard on the modern web.
       | 
       | Tbh I'll probably use the random bit more than search but
       | definitely going to keep checking back to pad my RSS feeds with
       | interesting content.
        
         | dbrereton wrote:
         | > Tbh I'll probably use the random bit more than search
         | 
         | That's interesting to hear, and fits well with the goals of the
         | site. I want it to be more of a "discovery engine" than a
         | "search engine". Search is one path to discovery, random posts
         | are another, there are probably more.
         | 
         | One thing I'm thinking of adding is the ability to easily see
         | the blog posts that any given post links to. If you see an
         | interesting post, you could pull up everything that may be
         | related.
         | 
         | > definitely going to keep checking back to pad my RSS feeds
         | with interesting content.
         | 
         | Sadly not every blog has RSS, and many RSS feeds are
         | incomplete. Another thing I would like to build is auto-
         | generated RSS feeds for all blogs, which would also make it
         | easy for people to programmatically parse any blog and do
         | interesting things.
        
       | COil wrote:
       | Excellent idea, I was thinking of creating something similar. My
       | new homepage, for sure.
        
       | derekzhouzhen wrote:
       | I agree with everything you said, except the popularity ranking.
       | The value of a content shall be in the content itself; popularity
       | is only a flawed measurement. Worse, popularity has very strong
       | positive feedback that contributes to the great polarization of
       | opinions.
        
         | dbrereton wrote:
         | > The value of a content shall be in the content itself;
         | popularity is only a flawed measurement.
         | 
         | Popularity is certainly a flawed measurement, but it's hard to
         | come up with a scalable way to determine quality that isn't
         | flawed in some way.
         | 
         | Instead of being flawed by encouraging people to get tons of
         | backlinks, this is flawed by encouraging people to do stuff
         | that gets lots of upvotes.
         | 
         | Very open to more ideas on how to measure quality.
        
           | derekzhouzhen wrote:
           | "a scalable way to determine quality" is the billion dollar
           | question so I have no idea. I'd just use plain old text index
           | with no algorithmic ranking.
        
       | ropeladder wrote:
       | This is a great idea! It's very tech centric, though (at least
       | judging from your directory).
        
       | azhenley wrote:
       | This is awesome! I see some blog posts of mine already on here
       | but using an outdated URL.
       | 
       | Are there any plans to check for redirects and update the URL or
       | to recrawl?
        
       | polote wrote:
       | This is cool and the quality of content is great too especially
       | to get the most known blogs of a topic (and I feel like the
       | quality of content is better than all the blogs search engine I
       | have seen).
       | 
       | But I don't feel like manual curation by one person is easily
       | compatible with search engine. To me the content of your website
       | is more suited to a weekly newsletter or something like that.
       | Because after trying a few search "getting a job in vc", "best
       | computer chair", "learning erlang" I'm not confident this answer
       | better results than Google.
       | 
       | You've got a content size problem as you are manually curating,
       | and this will lead to people not use your search as a default,
       | and probably not use it as a search engine, but instead as a
       | discovery system.
       | 
       | You can also try to get more blogs on your search engine, and
       | create a community around it, if you want more more, you can
       | follow this newsletter [1] and you will get probably 5 new blogs
       | per day.
       | 
       | Congratz on the job, this is very cool
       | 
       | [1] https://hnblogs.substack.com
        
         | dbrereton wrote:
         | > But I don't feel like manual curation by one person is easily
         | compatible with search engine.
         | 
         | I see the manual curation as more of a temporary measure in the
         | beginning. There are various ways blog detection can be
         | automated and scaled, but manual curation for now gives me a
         | better understanding of the data, and ensures I don't run into
         | random edge cases.
         | 
         | > Because after trying a few search "getting a job in vc",
         | "best computer chair", "learning erlang" I'm not confident this
         | answer better results than Google.
         | 
         | Right now it's more useful for very broad queries like
         | "inflation" or "covid". The index is pretty small at the
         | moment, but the more posts that get added to the index, the
         | more specific queries we'll be able to find good results for.
         | 
         | > You've got a content size problem as you are manually
         | curating, and this will lead to people not use your search as a
         | default, and probably not use it as a search engine, but
         | instead as a discovery system.
         | 
         | That's actually what I want! This is not a search engine to
         | replace Google, it's a discovery tool for blog posts.
         | 
         | Thanks for all the feedback here, and will definitely check out
         | the newsletter.
        
       | taubek wrote:
       | This is great. What is the policy on accepting blogs to be
       | indexed?
        
       ___________________________________________________________________
       (page generated 2022-03-29 23:00 UTC)