[HN Gopher] Improving GitHub Code Search
       ___________________________________________________________________
        
       Improving GitHub Code Search
        
       Author : todsacerdoti
       Score  : 324 points
       Date   : 2021-12-08 17:01 UTC (5 hours ago)
        
 (HTM) web link (github.blog)
 (TXT) w3m dump (github.blog)
        
       | [deleted]
        
       | zxienin wrote:
       | I have 10 different git + github instances across my org. (~50k
       | strong workforce, pre github repos, m&a etc). Does this cs offer
       | aggregated searches across all those distributed repos?
        
         | dstaheli wrote:
         | Hi zxienin. I'm a GitHub product manager. May I assume the
         | GitHub instances you're describing are GitHub Enterprise Server
         | instances? We plan to bring advanced code search features to
         | all GitHub plans including Enterprise Server once we've
         | stabilized the UX and feature set. But it sounds like your
         | situation goes beyond that, where the search needs to include
         | code from Git repositories outside of GitHub Enterprise Server.
         | That makes good sense, and we'll definitely consider it. If you
         | want to keep in touch about it, please feel free to post in our
         | feedback forum: https://github.com/github/feedback/discussions/
         | categories/co.... Thank you!
        
           | zxienin wrote:
           | I shall, thnx.
           | 
           | ps: yes, enterprise server instances
        
       | jpgvm wrote:
       | Given the shoutouts to Burntsushi and Lemire this is almost
       | certainly a bitmap trigram index based engine similar to
       | https://github.com/google/zoekt
       | 
       | The index is likely based on Roaring bitmaps, presumably
       | https://github.com/RoaringBitmap/roaring-rs in this case.
       | 
       | Nice architecture, exactly how I would have done it also.
        
         | rurban wrote:
         | Nope, I would have used an existing search solution, like
         | xapian. It does so much more, and much faster.
         | 
         | You need to support a proper query syntax, with tags, rankings,
         | stopwords, stemming. Then you need to have a proper db backend
         | (reverse indices). Trigrams dont help for regex. Then a
         | templated representation. Google codesearch would do only the
         | 2nd of 3. ElasticSearch is commercial, and only java.
         | 
         | Doing that from scratch is a bit silly.
        
           | tanoku wrote:
           | Oh OK, you have clearly spent more time thinking about this
           | problem than the team of engineers at GitHub who've been
           | researching code search at scale for more than four years. I
           | bet they feel real silly right now knowing they could have
           | shipped this search engine in a couple weeks taping together
           | off-the-shelf libraries if they only had your talent for
           | software architecture.
        
           | preseinger wrote:
           | Code search typically does not need many (most?) full-text
           | search features like TF-IDF, stopwords, stemming, tagging,
           | etc. It's a categorically different domain.
        
           | yencabulator wrote:
           | > Trigrams dont help for regex.
           | 
           | https://swtch.com/~rsc/regexp/regexp4.html
        
         | v1g1l4nt3 wrote:
         | Related: https://srcgr.ph/zoekt-memory-optimizations
        
       | kevinsundar wrote:
       | Security researchers are gonna love this :)
       | 
       | Time to go secrets and url hunting.
        
         | jayflux wrote:
         | You could already do this, grep.app for example has existed for
         | a while. This is just bringing those features in-house.
        
       | latenightcoding wrote:
       | I always though Github's bad search functionality was a business
       | decision. It was so bad for so long. Even if basic improvements
       | are significantly harder at their scale, I just can't comprehend
       | how Microsoft left something so potentially useful be so bad for
       | so long.
        
         | [deleted]
        
         | v1g1l4nt3 wrote:
         | Yeah, I'd bet on https://about.sourcegraph.com. Fully focused
         | on code search and are still light years ahead.
        
       | ElectronShak wrote:
       | Reminds me of https://grep.app, Search across a half million git
       | repos [1]
       | 
       | 1. https://news.ycombinator.com/item?id=22396824
        
         | tuananh wrote:
         | i still dont understand why it can be so far, haha
        
       | einpoklum wrote:
       | They haven't implemented wildcard search for... well, ever:
       | 
       | https://github.com/isaacs/github/issues/402
       | 
       | I don't even care it's very fast. Just make it work. I just hope
       | this isn't snake oil. Weird that they claim regex support but no
       | wildcard support.
        
       | heipei wrote:
       | Curious if this is something completely bespoke or simply a beefy
       | ElasticSearch cluster which uses the (relatively) new "wildcard"
       | field for enabling regex search on select fields. The search
       | syntax certainly maps 1:1 to the ElasticSearch Query String
       | syntax, including phrase search, boolean operations, grouping,
       | regex search, etc.
        
         | 100k wrote:
         | (I worked on this and the prior version of code search that
         | uses Elasticsearch.)
         | 
         | It is a custom search engine, built from the ground up for
         | code. We'll be sharing more details about it on the GitHub blog
         | soon.
        
       | anarazel wrote:
       | Some logic to exclude duplicate results would be useful. I often
       | search to see how many external users there are of some API in
       | postgres. But there's hundreds of separate repos with similar
       | contents showing up in the search results...
        
       | adamnemecek wrote:
       | I use github search a lot and this would be an insane
       | productivity boost. I signed up for the waitlist. Does anyone
       | working at Github want to bump me in the queue? This is my
       | profile https://github.com/adamnemecek/
        
         | adamnemecek wrote:
         | I just got access to it. I'm not sure if someone here helped
         | but if yes, then thank you very much.
        
         | v1g1l4nt3 wrote:
         | You can skip the wait and use https://sourcegraph.com/search
         | instead.
        
       | jshier wrote:
       | Got into the preview, can finally search for actual code! One
       | thing I'd like to see, though, is the ability to mark directories
       | to be ignored in the search results. No one needs to search the
       | raw HTML of my generated documentation, yet it shows up in every
       | search for project symbols. And since HTML is considered
       | "source", I can't filter it out unless I select a particular
       | language.
       | 
       | Also the search text field is bit messed up in Safari when the
       | text gets longer than the field.
        
         | colin353 wrote:
         | GitHub Code Search developer here - try creating a custom scope
         | to filter out that stuff! Click on the scopes dropdown and
         | scroll to the bottom. You can filter out HTML by using a query
         | like:
         | 
         | NOT language:html
        
           | jshier wrote:
           | Ah, I was trying language:!html.
           | 
           | Would still be great to ignore my docs directory.
        
           | esprehn wrote:
           | It would be great if this used the same filter format as
           | sourcegraph and other internal code search tools. ex.
           | -file:.html is enough to filter away files ending in html in
           | the main search box.
           | 
           | Having to use dropdowns and multiple input fields is more
           | cumbersome than the filter language of repo:, file:, lang:
           | etc.
        
       | adamnemecek wrote:
       | I hope they add deduplication. I can't count the number of times
       | when I get 100 pages of results where 95 pages is from the same
       | included library.
        
         | 100k wrote:
         | (I worked on this.)
         | 
         | This is on our radar! We de-duplicate exact matches now, but
         | we'd like to do the same for near-similar documents.
        
           | elliottcarlson wrote:
           | De-duping exact matches is a game changed -- search has been
           | miserable to use because of the dupes for so long. I can live
           | with near-similar documents. Very excited to test this out.
        
             | colin353 wrote:
             | Another GitHub Code Search developer here - to add more to
             | this, we rank all the search results, and try to bring the
             | most relevant results to the top. Ideally, if you have 10
             | pages of results, you shouldn't have to leave page 1 to
             | find what you're looking for :D
        
           | sumtechguy wrote:
           | That would be a tough problem. As de-dup you probably want to
           | show/point towards the 'original' tree. But which one is the
           | source? Or even worse someone abandons a project but someone
           | else forked it and kept going should it show that one
           | instead? Or should it show the one it was forked from
           | depending on the version number. Which one is the 'true' repo
           | now? Most certainly an interesting problem.
        
       | francislavoie wrote:
       | It really looks like they took a lot of inspiration from
       | https://sourcegraph.com/search with this. Not a bad thing at all.
       | I hope SourceGraph doesn't get obsoleted by this though, they're
       | great people.
        
         | junon wrote:
         | I remember seeing this years ago and thought it was a bit
         | subpar but it appears they've made strides since then. I might
         | start using this again.
        
         | lancemurdock wrote:
         | had a pretty awful interview experience there a while back.
         | Can't say I experienced great people
        
           | anandchowdhary wrote:
           | I interviewed for Sourcegraph and it was one of the best.
           | Super transparent process, open source handbook, fun coding
           | tasks -- really nothing to complaint about. Would be curious
           | to know what made you have such a different experience.
        
           | sqs wrote:
           | Sourcegraph CEO here. I'm really sorry about that. We work
           | really hard on making our interviews good for everyone,
           | including documenting it publicly at
           | https://handbook.sourcegraph.com/talent/interview_process.
           | Could you please email me at sqs@sourcegraph.com so I could
           | find out what happened?
        
           | mholt wrote:
           | I'm surprised... I absolutely _loved_ my interview with
           | Sourcegraph. I kind of wish every tech company interviewed
           | like they do.
        
           | gavinray wrote:
           | I've met two of their devs randomly in different Discord
           | servers. Both were great people (Noah, Olaf) and are very
           | active in OSS communities. Perhaps not coincidentally, both
           | worked on Language Server related stuff.
           | 
           | Olafur is responsible for a lot of Scala tooling and some
           | pretty neat original ideas.
           | 
           | Sourcegraph also came up with LSIF, which is useful format
           | for building tooling for language servers:
           | 
           | https://lsif.dev
           | 
           | If you want to build this sort of stuff, the work Sourcegraph
           | has done with LSIF + SemanticDB is probably your easiest bet.
           | 
           | N=2 isn't great, but there's my experiences if we're tossing
           | them out there.
        
         | sqs wrote:
         | Sourcegraph CEO here. Imitation is the sincerest form of
         | flattery. We are very transparent, have a ton of users, and are
         | open-core, so it's easy to get inspiration from us. :) We want
         | way more devs to be using code search since it's so valuable
         | 10x+/day, and if this helps, then we are very happy for that.
         | Devs get to choose the code search tool they use, so the best
         | tool will win (you wouldn't use Bing if your boss made
         | you...likewise, code search isn't like team chat or team docs).
        
       | trinovantes wrote:
       | Still waiting for the ability to search in other branches. It's a
       | pain when some codebases have stable releases on the next/dev
       | branch but keep their main branch to the previous release.
        
         | namrog84 wrote:
         | Absolutely. I get they don't want to index every branch but at
         | least set some heuristics like it it has a certain amount of
         | activity or something per repo. Or even allow repo to opt into
         | 1 to 2 other branches besides main. Especially for bigger
         | projects
         | 
         | That'd cover 95% of repo I've seen.
        
       | jkelleyrtp wrote:
       | Seems to be that Rust's killer app is burntsushi's mind and
       | ripgrep. :-)
        
       | samueldr wrote:
       | Only thing missing is indexing of branches and forks.
       | 
       | My main use case for GitHub search is identifying provenance of
       | misc. changes in vendor source code tarballs for e.g. Android
       | kernel releases. It's hard, but sometimes possible to rehydrate
       | most of the existing commits through cherry-picks and careful
       | rebases.
       | 
       | The biggest problem with the lack of indexing branches and forks
       | is that sometimes vendors makes releases through branches, or
       | that sometimes repos of interests are forks of e.g.
       | `torvalds/linux`.
       | 
       | Hopefully we can see those being indexed in the future.
       | 
       | I'm also curious: has the plan to drop "less active" repos from
       | the index gone through? Has anything changed?
        
         | alufers wrote:
         | > I'm also curious: has the plan to drop "less active" repos
         | from the index gone through? Has anything changed?
         | 
         | Whaaat? I hope it doesn't go through. I use GitHub code search
         | for clues when reverse engineering cheap Chinese IoT crap.
         | Usually I can find some headers / SDKs accidentally uploaded
         | and set to public by a random Chinese guy. Those repos usually
         | have one commit and zero traffic, but they contain invaluable
         | information about proprietary MCUs.
        
         | ihnorton wrote:
         | I would personally like to see less indexing of duplicate
         | files! There are many things I've searched for which return
         | 100s of results from independent checkin-uploads of big
         | libraries like the Android SDK. It would be great if results
         | were filtered by file similarity regardless of git history (if
         | that is in fact the issue).
        
       | beached_whale wrote:
       | Got an opportunity to try it a few minutes ago and it's awesome
       | so far. I was able to look for my code in repos I don't own, e.g
       | `not org:user foo::bar`
        
       | beltsazar wrote:
       | Does anyone know (or guess) what kind of index they use to
       | provide regex searches? I'm really curious.
        
         | 100k wrote:
         | We'll be sharing more details soon on the GitHub blog.
        
       | Falell wrote:
       | > Search for an exact string, with support for substring matches
       | and special characters, or use regular expressions (enclosed in /
       | separators).
       | 
       | Finally!
       | 
       | Search-for-literal is so important when you have technical users
       | working on non-prose text.
       | 
       | They say this is going in a dedicated search page 'to start
       | with', if "<literally any text>" doesn't work in the top bar
       | eventually this is still going to be miserable.
        
         | colin353 wrote:
         | I'm from the team that developed this at GitHub - if you are in
         | the technology preview, then you can jump into cs.github.com
         | from searches done at the top bar.
        
           | gavinray wrote:
           | Thank you
           | 
           | I use Github's UI for exploring and searching codebases more
           | often than my own environment, since I do a lot of curious
           | browsing.
           | 
           | No offense, but the search is so bad for anything worse than
           | a single word, that I've developed a sort of intuition for
           | how to phrase things -- and then still spend a lot of time
           | crawling pages of results haha.
           | 
           | This was sorely needed
        
             | colin353 wrote:
             | Couldn't agree more - that's why we built it! Please give
             | the new search a shot, I think you'll like it :D
        
           | mholt wrote:
           | What's your take on developing a new code search instead of
           | partnering with an existing global code graph like
           | Sourcegraph? What are the advantages of GitHub Code Search
           | over Sourcegraph?
        
             | edwinyzh wrote:
             | Well, in the past I've tried Sourcegraph several times, but
             | it never give me experiences that match the was-dead-many-
             | years-ago Google Code Search. I wish the new github code
             | search does that.
        
             | zxienin wrote:
             | +1 pretty much what was on my mind, seeing this. does this
             | compete or complement sourcegraph?
        
       | bsagdiyev wrote:
       | Now can they fix doing a language search for "Visual Basic"? If
       | you filter a users repos or stars on that language it just shows
       | all their repos or stars. Code search for language "Visual Basic"
       | returns all repositories and does not limit by language like it
       | should.
        
       | remram wrote:
       | Meanwhile on GitLab, you can't even search in issue comments
       | (only the title/description from the author).
        
         | john_cogs wrote:
         | GitLab team member here.
         | 
         | Comment (and code) search is available for projects in all
         | GitLab tiers: https://docs.gitlab.com/ee/user/search/#basic-
         | search
         | 
         | Premium and Ultimate users have access to Advanced Search:
         | https://docs.gitlab.com/ee/user/search/advanced_search.html
        
           | remram wrote:
           | There is a way to search for comments using the "global
           | search", but no way to search for text over issues and their
           | comments. In particular, no way to search from the issue tab,
           | no way to search over comments only in issues (or only in
           | merge requests), no way to combine a text search with
           | label/milestone/status filters, etc.
           | 
           | So it's a workaround, but a bad one.
           | 
           | Here's the ticket (2015): https://gitlab.com/gitlab-
           | org/gitlab/-/issues/13891. The fact that it has so many
           | duplicates in your own project's issue tracker is a good
           | indicator of how bad your issue search is.
        
             | boleary-gl wrote:
             | GitLab team member here.
             | 
             | > no way to combine a text search with
             | label/milestone/status filters, etc.
             | 
             | You can combine text search with field search (like
             | label/milestone/status. Here's an example:
             | https://gitlab.com/gitlab-
             | org/gitlab/-/issues?search=Visuali...
        
       | cosentiyes wrote:
       | The addition of exact match search is so exciting that I haven't
       | internalized any of the other new features. I've abandoned an
       | ungodly number of semi-common-word searches after getting 30
       | pages of results in a monorepo
        
         | philsnow wrote:
         | I didn't even see this in the feature list before doing the
         | signup. One of the signup questions is "how do you usually
         | search?" or so, I wrote in the blank "I want to search for
         | symbols, not substrings, so if I'm searching for `bar` I don't
         | want `foo_bar` to show up as a match". I usually do this with
         | word boundaries in regexes, but I pretty much have to have the
         | repo downloaded, so it's useless for searching on github.com
         | this way.
        
       | jrochkind1 wrote:
       | I love how the Microsoft acquisition continues to result in
       | _increased_ investment in github with microsoft 's resources, and
       | real vision; not always how an acquisition goes.
        
         | adamnemecek wrote:
         | Microsoft has always been a dev tool company.
        
           | einpoklum wrote:
           | You wouldn't know it looking at MS Visual Studio though.
        
           | NmAmDa wrote:
           | I doubt that before WSL this would be something. I mean
           | developing on windows was always far lot difficult than Linux
           | or MacOS.
        
             | adamnemecek wrote:
             | It depends on what you were developing.
        
             | johannes1234321 wrote:
             | Win32 API isn't nice, but Microsoft was always relatively
             | good with documentation etc. and don't forget all the
             | developer support within Excel, VBA, Visual basic etc. Bill
             | Gates early on understood the premise of building a
             | platform and not breaking it. Even if that meant win32 API
             | became ugly over time. Old windows programs still work on
             | newer releases.
        
         | swyx wrote:
         | and a departure of all the key execs
        
           | jrochkind1 wrote:
           | what about it? That's not even a sentence.
        
       | mintplant wrote:
       | If anyone from GitHub is listening, being able to exclude test
       | code with a few clicks would be an absolute game-changer. By far
       | the biggest source of noise in my GH code search results, and I
       | use the tool (and similar tools like Searchfox) super super
       | heavily. Either way, stoked to try this out.
        
         | halayli wrote:
         | Exactly this. It will also reduce unnecessary requests on their
         | servers.
        
         | Koffiepoeder wrote:
         | And inversely, searching specifically for test code can also be
         | useful. For example if searching for an implementation example.
        
         | 100k wrote:
         | Thanks for the feedback! We downrank test files with a
         | heuristic, though we'll definitely be looking to make this more
         | sophisticated. You can also exclude results using a regular
         | expression, like `foo NOT path:/_test\\.go$/`.
        
           | dcreager wrote:
           | And also note that if you often need to add this kind of
           | qualifier to many searches, you can create a "custom scope"
           | that includes it for you transparently.
        
       | leaded_syrinx wrote:
       | This is great, specified search on GitHub has previously been
       | very hit or miss. Generally I use the search feature for learning
       | / trying to see if something I'm trying to do already exists. I
       | personally think vsCode has the best code search implementation,
       | in terms of "exact", "partial" and "regex" matching. The UI is
       | clear, non-technical team members can navigate their way around
       | it and it's relatively fast assuming you don't have too many
       | extraneous plugins installed.
        
       | yashap wrote:
       | Wow, HUGE feature, congrats to the team working on it! GH code
       | search is a feature with such massive potential utility, but the
       | old implementation was so weak it was basically useless. Looking
       | forward to this, will use it constantly if it's good.
        
       | AtNightWeCode wrote:
       | Of all the tools I use on a daily basis Github is probably the
       | worst. I mean the "Find a repository..." input field on the start
       | page can not even filter out named repositories I have access to
       | in all my organizations. It works for some repos but not all.
       | 
       | Search improvements? It is impossible to create a worse search
       | experience than Github. Just clone and use git grep instead in
       | most cases.
       | 
       | Edit: ...and the 425% price increase for SSO..
        
         | oubliette wrote:
         | Try constraining your search in Google/DDG with:
         | site:github.com query
        
         | post-it wrote:
         | Could be worse, could be Reddit search.
         | 
         | (Granted, this is largely due to a culture of titles like
         | "Check out this thing" that provide zero searchable metadata +
         | no tag system.)
        
         | v1g1l4nt3 wrote:
         | No need to clone if you just use
         | https://sourcegraph.com/search.
        
       | svnpenn wrote:
       | Has the "Last indexed" been fixed?
       | 
       | whenever I search for code, it will say something like "Last
       | indexed on Apr 2", but if you go to the actual file, the date
       | will say 5 years ago or something. So currently the "Last
       | indexed" listed date is completely useless, and you have to
       | basically click through to every result.
        
         | 100k wrote:
         | (I worked on that system and the new one.)
         | 
         | Yes, sadly, that is literally when the file was _indexed_. So
         | it's not particularly useful. It's a difficult problem to
         | solve, but I'll bring up your feedback to the team.
        
       | Petesta wrote:
       | Glad to see GitHub's search has improved. I hope GitHub finally
       | improves the search functionality on gists. You can't search your
       | own gists by name.
        
       | savanpatel wrote:
       | Why does it matter to speak they built in rust in demo video? It
       | should not matter to customers.
        
       | dvirsky wrote:
       | Are there any open source powerful code search engines out there?
       | As a Googler the internal code search we have here is one of the
       | most incredible things I've ever seen, it's so fast and powerful
       | I'm amazed by it daily. Is there anything near that quality out
       | there?
        
         | dqv wrote:
         | Not a Googler, so I can't say. There was Mozilla DXR but it has
         | been abandoned.
        
           | jcranmer wrote:
           | DXR has largely been replaced with mozsearch
           | (https://github.com/mozsearch/mozsearch), and a quick glance
           | through the really early history does show that it adopted a
           | fair amount of stuff from DXR. The downside is that it's not
           | as easy to set up a local mozsearch instance as old-school
           | DXR was.
        
         | jcranmer wrote:
         | I helped write DXR for indexing Mozilla's source code based on
         | an instrumented compiler run; this has eventually been
         | developed into mozsearch
         | (https://github.com/mozsearch/mozsearch), whose indexing for
         | mozilla-central is visible here: https://searchfox.org.
        
           | dqv wrote:
           | I thought it was abandoned! This is great to hear it just
           | moved. Is there anyone at Mozilla that can update the old DXR
           | repo [0] to direct people to MozSearch?
           | 
           | [0]: https://github.com/mozilla/dxr
        
           | jwin742 wrote:
           | I work on a very large c++ monolith at work and DXR has been
           | a real game changer for helping me just figure out how so
           | much of the codebase works. Thanks!!
        
         | Falell wrote:
         | My job uses https://oracle.github.io/opengrok/ and I'm
         | generally happy with it. It has some problems with special
         | character searches at times but generally does what I want.
         | It's certainly better than code search in our on-prem github
         | instance.
        
           | slaymaker1907 wrote:
           | Yeah, opengrok is great. It is very fast and usually returns
           | good results.
        
         | ibraheemdev wrote:
         | https://grep.app/ is a great alternative to github's current
         | search engine.
        
         | throwamon wrote:
         | Would you by any chance be allowed to record a demo screencast?
        
           | dti wrote:
           | You can try it yourself, e.g., the instance the Android team
           | uses: https://cs.android.com/
        
             | dvirsky wrote:
             | Oh, I didn't know this existed. The syntax seems to be on
             | par with the internal one, I couldn't find any info on
             | what's driving it.
        
               | dti wrote:
               | Also don't know how search works there, but the cross-
               | reference functionality is powered by an open-source
               | Kythe project: https://kythe.io/
        
         | toomuchtodo wrote:
         | If you don't mind me asking, any insight into why it hasn't
         | been open sourced?
        
           | dvirsky wrote:
           | There is some older version that's open source, I haven't
           | tried it and I don't know how much of today's code search is
           | based on it.
           | 
           | https://github.com/google/codesearch
        
         | profquail wrote:
         | Hoogle is pretty neat -- you can search by type signature and
         | it'll find matching APIs from hackage packages:
         | https://hoogle.haskell.org/
         | 
         | Source: https://github.com/ndmitchell/hoogle
        
         | beliu wrote:
         | We built Sourcegraph taking inspiration from Google Code Search
         | (https://about.sourcegraph.com/blog/ex-googler-guide-dev-
         | tool...) to bring the power of code search--and precise code
         | intelligence that just works--to every dev. Try it out here:
         | https://sourcegraph.com. A super common thing we see is people
         | leaving Google, missing code search, and then bringing
         | Sourcegraph into their new org. We'd love to hear your
         | feedback!
        
           | beliu wrote:
           | Sourcegraph is open-core, with a dual licensing approach. You
           | can run the open-source version here:
           | https://github.com/sourcegraph/sourcegraph#sourcegraph-oss,
           | and we have an enterprise offering for companies that want to
           | adopt for their teams. Similar to GitLab, both our enterprise
           | and OSS code is publicly available.
        
           | Arnavion wrote:
           | The best thing about the Sourcegraph instance hosted on
           | sourcegraph.com is that you can edit the URL in your browser
           | from https://github.com/foo/bar to
           | https://sourcegraph.com/github.com/foo/bar to be dropped down
           | into a Sourcegraph search for that GH repo. I've been using
           | it for a long time because of this convenience.
           | 
           | (Though it would be even better if the two options for case-
           | sensitivity and regex search were enabled by default instead
           | of needing me to toggle them on every time.)
        
             | billcaplan wrote:
             | You should be able to do that over in your User Settings
             | (Click your picture in the top right and then Settings.)
             | Adding these two things should change that default for you:
             | "search.defaultCaseSensitive": true,
             | "search.defaultPatternType": "regexp",
             | 
             | Also see:
             | https://docs.sourcegraph.com/admin/config/settings#search-
             | de...
        
               | Arnavion wrote:
               | I don't have a user account (nor do I want to make one).
        
           | axiosgunnar wrote:
           | Are you worried this new Github Code Search might steal all
           | your users?
        
       | murat124 wrote:
       | Not sure if it's good enough to replace https://grep.app/
        
         | 100k wrote:
         | (I worked on this.)
         | 
         | Give it a shot and let us know what you think! Where can we
         | improve it?
        
           | beltsazar wrote:
           | What kind of indexes do you use to provide regex searches?
        
           | johndough wrote:
           | Today I wanted to search for "strstr[a-z]+?_r" but got the
           | error message "This is a partial result set. The search was
           | stopped early because it would take too long to check every
           | file for this regular expression.". However, I got results
           | for the less restrictive regex "strstr.+?_r" which is weird
           | since I'd expect that it would be easier to return results
           | for more restrictive regular expressions. Not sure if there
           | is a perfect solution for this, but in many cases, you could
           | probably search for the less restrictive version and filter
           | the results with the more restrictive one after that.
           | 
           | Also it would be great if more repositories were indexed. How
           | do things work behind the scenes? Maybe it is possible to
           | build a more memory-efficient index just for exact string
           | search, which probably make up most searches.
           | 
           | Anyway, this website is amazing and I use it quite often.
           | Thank you a lot for working on this!
        
             | 100k wrote:
             | Thanks for the feedback, we're working on some changes to
             | improve regular expression performance.
             | 
             | We're also working hard to increase the number of
             | repositories indexed. :)
        
         | jayflux wrote:
         | I think that app triggered the inspiration to do this. So I
         | would think what they deliver will be similar or have some
         | feature parity.
        
       | deft wrote:
       | I always thought the search was purposely bad and overly limited
       | to prevent scraping for credentials.
        
       | tyingq wrote:
       | Ah, great. GitHub throwing out special characters in searches was
       | infuriating for languages with sigils and patterns, like $somevar
       | or %sql% and so on.
        
       | oezi wrote:
       | Any idea how to get further ahead on the waitlist for co-pilot?
        
         | [deleted]
        
       | jimsimmons wrote:
       | Slight tangent: The video has a guy describing the tool and he
       | includes the fact that it's written in rust when introducing it.
       | I've always found this sort of name dropping in rust
       | projects/devs baffling. Is there anything that I'm expected to
       | infer from it? Is it that it's backend is memory safe? I can't
       | think of anything else. Now it may very well be very memory safe
       | but why include that highly specific detail when talking about a
       | very high level thing that is the UX of search. What if it was
       | written in Haskell or C#? Would it still be brought up? It's
       | almost as if being written in rust is a feature in itself these
       | days. As a technical guy I can't help but take the person less
       | seriously, especially when it's as unwarranted as this.
        
         | qaq wrote:
         | It's obviously personal preference but as a technical guy I am
         | always curious what lang. a project is using.
        
         | nindalf wrote:
         | He's talking about text search and the post thanks @BurntSushi.
         | That means they're using the fastest text search tool out there
         | - ripgrep. I won't mention what it's written in, because that
         | clearly upsets you.
         | 
         | Benchmark - ripgrep is faster than {grep, ag, git grep, ucg,
         | pt, sift} (2016) - https://blog.burntsushi.net/ripgrep/
        
         | t3rabytes wrote:
         | Go had this issue for a while, too, it's finally started to
         | calm down as Go hits a mainstream that is (imo) much farther
         | than Rust is currently. I think much is just people trying to
         | add validity to Rust for large-scale production workloads, in
         | the same way that Kubernetes was "a compute scheduler written
         | in Go" or Terraform was "infrastructure as code written in Go"
         | (maybe those are bad examples, but I know I've seen the "X
         | written in Go" thing going on).
        
           | gscho wrote:
           | This is exactly how I see it as well. Rust used to be an
           | obscure language with a compiler written in OCAML. If
           | something was written in D or zig, it's noteworthy so you
           | mention it. I think rust has come into the mainstream enough
           | that we can drop the "written in rust" line imo.
        
         | eyelidlessness wrote:
         | I think depending on where the audience is coming from--for
         | example people who primarily work in scripting/interpreted
         | languages--Rust can also be a positive signal for performance.
        
         | colin353 wrote:
         | Hey! That was me in the video.
         | 
         | Not ashamed to be a Rust evangelist! The reason I mentioned
         | Rust is because we spent a lot of time making the experience
         | really fast - which is super important for a product like this.
         | I really think getting the performance we have would have been
         | enormously more difficult in any other language.
        
           | Dowwie wrote:
           | Fellow Rustacean here. Is the search engine secret sauce or
           | something that could perhaps be open sourced? I'd like better
           | tooling for searching private code bases. Also, would you
           | consider writing about optimization techniques you used?
        
             | colin353 wrote:
             | We are looking into open sourcing some libraries that we've
             | developed for search. And we're going to write a blog post
             | with way more technical details soon!
        
         | aaaaaaaaaaab wrote:
         | It really is like the joke about vegans. So tiring.
        
         | isaacimagine wrote:
         | I agree with you, but I just wanted to point out the following:
         | 
         | In general, Rust, C, and C++ are going to be faster than
         | languages like Ruby*. He brought up Rust while discussing the
         | performance of the new tool. Although performance is more
         | complex than language choice, etc., saying it's written in Rust
         | gives the viewer an approximate lower bound as to how fast the
         | tool should be.
         | 
         | *: (GH started as a Ruby shop, so I wouldn't be surprised if
         | that's what the original tool was written in).
        
       | ju-st wrote:
       | Is there any good reason why the search doesn't find file names?
       | Or does it now with the new search?
        
         | colin353 wrote:
         | The new search does find filenames! :D
        
       | nerdkid93 wrote:
       | I wonder if it is the followup to this conversation from last
       | year when https://grep.app was released:
       | https://news.ycombinator.com/item?id=22397728
        
         | judge2020 wrote:
         | Probably not, they've been looking at/working on improved Code
         | Search since 2019: https://youtu.be/9EoNqyxtSRM?t=1726
        
       | l0b0 wrote:
       | Now, can we please get GitHub issues back into third party search
       | engines? Now, whenever I search for something I _know_ is in an
       | issue I only ever get results from those crappy GitHub scraper
       | sites. This is happening on both Google and DuckDuckGo.
        
         | valtism wrote:
         | I don't think Github has any control over this without changing
         | their content license.
        
       | patrickdevivo wrote:
       | this looks awesome! two things I've always wanted and haven't
       | found satisfying solutions for in code search (in an editor)
       | 
       | 1) an ability to easily express higher level concepts in a search
       | that's aware of code semantics ("match only function names",
       | "find call sites of a method") etc. Maybe this is possible with
       | existing tools (probably is?) but I tend to get lazy about
       | learning DSLs - would love to see this in a UI if it's possible
       | 
       | 2) ability to save searches I do frequently - after a certain
       | level of complexity in a query (I've added ignore rules, I
       | crafted the right regex, etc), I want to be able to save the
       | "context" of a search so that I can easily return to it later
        
         | colin353 wrote:
         | GitHub Code Search developer here:
         | 
         | > would love to see this in a UI if it's possible
         | 
         | We do have code navigation via the UI, so in a way it's
         | possible!
         | 
         | > ability to save searches I do frequently
         | 
         | Absolutely! This is possible using "custom scopes". If you're
         | in the technology preview, click on the scope dropdown, scroll
         | to the bottom, and choose "custom scopes". You can make a
         | custom scope to search a set of respositories, a particular
         | language, within a directory, or any combination with boolean
         | operators!
        
         | pianoben wrote:
         | I've been in the preview for a bit.
         | 
         | 1) This doesn't seem to exist in quite that way, but you can
         | prefix a literal with "def:" and the engine will return only
         | definitions of that thing (so far as it can tell). It's not
         | quite what you (or I!) want, but close.
         | 
         | 2) This exists and is called "scopes". On the landing page, to
         | the left of the search bar, click the grey pill that says "All
         | repos". At the bottom there is a "custom scopes" option.
        
           | colin353 wrote:
           | Might also be worth checking out the syntax guide:
           | https://cs.github.com/about/syntax#symbol
        
         | jjwiseman wrote:
         | It's local-only search, but you reminded me that this is
         | possible with MacOS Spotlight. I wrote an indexer (for Common
         | Lisp) that let you search for function definitions, etc.
         | 
         | http://lemonodor.com/archives/001232.html
         | 
         | For example, if you're looking for a search-and-replace
         | function you know you wrote or had somewhere on your machine,
         | you could do                   mdfind "org_lisp_defuns ==
         | '*search*replace*'"
         | 
         | (Or just use the regular Spotlight UI.)
        
       | beached_whale wrote:
       | I just want to say about time. A lot of the time when using
       | libraries with inadequate documentation, being able to find
       | usages of a method or class gives really good insight into the
       | library. But the current code search's stemming removes all the
       | context needed to find that and then gives alternate spellings
       | too.
        
       | questiondev wrote:
       | i was actually really surprised that this did not exist when i
       | went to search github for the first time. you would think that an
       | open source giant would have this ability but i guess there is a
       | ton of computational load to achieve search in general. i'll
       | probably get downvoted for bringing up a whacky idea, but imagine
       | having some type of referencing system that is done through multi
       | node p2p, so searching certain systems using shared resources. i
       | guess the major problem would be if devs would actually spare
       | some of their personal computational resources to help the
       | community find things and not rely on special interest groups. i
       | get it, i am old school as well. i started out on pascal and
       | BASIC. but still think using creative solutions is fun. but you
       | know, napster was cool back in the day prior to their lawsuits.
       | and p2p was starting to pick up speed
        
         | throwamon wrote:
         | There was a recent post on search engines where I believe a P2P
         | solution was mentioned (but maybe it was on some related post
         | within a few days of this one):
         | https://news.ycombinator.com/item?id=29417061
        
       | ryanseys wrote:
       | I'd love some shorter keywords here for searching so this was
       | quickly composable into something useful.
       | 
       | E.g.
       | 
       | p: or f: instead of path: for filenames
       | 
       | l: instead of language:
       | 
       | -f: to exclude specific filenames (makes it easy to filter out
       | tests)
       | 
       | You get the idea.
        
       | oever wrote:
       | Code search on GitHub is only available to people that log in
       | with Microsoft. Clicking on 'Code' redirects to the login page.
       | 
       | It is not a friendly site. Open source projects would do better
       | to use an open source code forge like <https://sr.ht/>.
        
         | v1g1l4nt3 wrote:
         | Meanwhile... https://sourcegraph.com/search
        
       | W0lf wrote:
       | Great. I'm using grep.app[1] usually as for me the GitHub search
       | is mostly useless. Your mileage may vary though. That being said
       | there are many other great search interfaces that I am using
       | often when I'm trying to find solutions to common problems or
       | specific design patterns. Chromium search[2] comes to mind,
       | Mozilla's Firefox[3], Android[4] or of course Google[5]
       | 
       | [1] https://grep.app/
       | 
       | [2] https://cs.chromium.org/
       | 
       | [3] https://dxr.mozilla.org/mozilla-central/source/
       | 
       | [4] https://cs.android.com/
       | 
       | [5] https://cs.opensource.google/
        
         | v1g1l4nt3 wrote:
         | Sourcegraph[6]
         | 
         | [6] https://about.sourcegraph.com
        
       | majso wrote:
       | This is great! As a project manager I am using github search
       | everyday when I am searching for specific methods or part of the
       | code in order to find logical issues or bugs in a code.
        
       ___________________________________________________________________
       (page generated 2021-12-08 23:00 UTC)