[HN Gopher] Improving GitHub Code Search ___________________________________________________________________ Improving GitHub Code Search Author : todsacerdoti Score : 324 points Date : 2021-12-08 17:01 UTC (5 hours ago) (HTM) web link (github.blog) (TXT) w3m dump (github.blog) | [deleted] | zxienin wrote: | I have 10 different git + github instances across my org. (~50k | strong workforce, pre github repos, m&a etc). Does this cs offer | aggregated searches across all those distributed repos? | dstaheli wrote: | Hi zxienin. I'm a GitHub product manager. May I assume the | GitHub instances you're describing are GitHub Enterprise Server | instances? We plan to bring advanced code search features to | all GitHub plans including Enterprise Server once we've | stabilized the UX and feature set. But it sounds like your | situation goes beyond that, where the search needs to include | code from Git repositories outside of GitHub Enterprise Server. | That makes good sense, and we'll definitely consider it. If you | want to keep in touch about it, please feel free to post in our | feedback forum: https://github.com/github/feedback/discussions/ | categories/co.... Thank you! | zxienin wrote: | I shall, thnx. | | ps: yes, enterprise server instances | jpgvm wrote: | Given the shoutouts to Burntsushi and Lemire this is almost | certainly a bitmap trigram index based engine similar to | https://github.com/google/zoekt | | The index is likely based on Roaring bitmaps, presumably | https://github.com/RoaringBitmap/roaring-rs in this case. | | Nice architecture, exactly how I would have done it also. | rurban wrote: | Nope, I would have used an existing search solution, like | xapian. It does so much more, and much faster. | | You need to support a proper query syntax, with tags, rankings, | stopwords, stemming. Then you need to have a proper db backend | (reverse indices). Trigrams dont help for regex. Then a | templated representation. Google codesearch would do only the | 2nd of 3. ElasticSearch is commercial, and only java. | | Doing that from scratch is a bit silly. | tanoku wrote: | Oh OK, you have clearly spent more time thinking about this | problem than the team of engineers at GitHub who've been | researching code search at scale for more than four years. I | bet they feel real silly right now knowing they could have | shipped this search engine in a couple weeks taping together | off-the-shelf libraries if they only had your talent for | software architecture. | preseinger wrote: | Code search typically does not need many (most?) full-text | search features like TF-IDF, stopwords, stemming, tagging, | etc. It's a categorically different domain. | yencabulator wrote: | > Trigrams dont help for regex. | | https://swtch.com/~rsc/regexp/regexp4.html | v1g1l4nt3 wrote: | Related: https://srcgr.ph/zoekt-memory-optimizations | kevinsundar wrote: | Security researchers are gonna love this :) | | Time to go secrets and url hunting. | jayflux wrote: | You could already do this, grep.app for example has existed for | a while. This is just bringing those features in-house. | latenightcoding wrote: | I always though Github's bad search functionality was a business | decision. It was so bad for so long. Even if basic improvements | are significantly harder at their scale, I just can't comprehend | how Microsoft left something so potentially useful be so bad for | so long. | [deleted] | v1g1l4nt3 wrote: | Yeah, I'd bet on https://about.sourcegraph.com. Fully focused | on code search and are still light years ahead. | ElectronShak wrote: | Reminds me of https://grep.app, Search across a half million git | repos [1] | | 1. https://news.ycombinator.com/item?id=22396824 | tuananh wrote: | i still dont understand why it can be so far, haha | einpoklum wrote: | They haven't implemented wildcard search for... well, ever: | | https://github.com/isaacs/github/issues/402 | | I don't even care it's very fast. Just make it work. I just hope | this isn't snake oil. Weird that they claim regex support but no | wildcard support. | heipei wrote: | Curious if this is something completely bespoke or simply a beefy | ElasticSearch cluster which uses the (relatively) new "wildcard" | field for enabling regex search on select fields. The search | syntax certainly maps 1:1 to the ElasticSearch Query String | syntax, including phrase search, boolean operations, grouping, | regex search, etc. | 100k wrote: | (I worked on this and the prior version of code search that | uses Elasticsearch.) | | It is a custom search engine, built from the ground up for | code. We'll be sharing more details about it on the GitHub blog | soon. | anarazel wrote: | Some logic to exclude duplicate results would be useful. I often | search to see how many external users there are of some API in | postgres. But there's hundreds of separate repos with similar | contents showing up in the search results... | adamnemecek wrote: | I use github search a lot and this would be an insane | productivity boost. I signed up for the waitlist. Does anyone | working at Github want to bump me in the queue? This is my | profile https://github.com/adamnemecek/ | adamnemecek wrote: | I just got access to it. I'm not sure if someone here helped | but if yes, then thank you very much. | v1g1l4nt3 wrote: | You can skip the wait and use https://sourcegraph.com/search | instead. | jshier wrote: | Got into the preview, can finally search for actual code! One | thing I'd like to see, though, is the ability to mark directories | to be ignored in the search results. No one needs to search the | raw HTML of my generated documentation, yet it shows up in every | search for project symbols. And since HTML is considered | "source", I can't filter it out unless I select a particular | language. | | Also the search text field is bit messed up in Safari when the | text gets longer than the field. | colin353 wrote: | GitHub Code Search developer here - try creating a custom scope | to filter out that stuff! Click on the scopes dropdown and | scroll to the bottom. You can filter out HTML by using a query | like: | | NOT language:html | jshier wrote: | Ah, I was trying language:!html. | | Would still be great to ignore my docs directory. | esprehn wrote: | It would be great if this used the same filter format as | sourcegraph and other internal code search tools. ex. | -file:.html is enough to filter away files ending in html in | the main search box. | | Having to use dropdowns and multiple input fields is more | cumbersome than the filter language of repo:, file:, lang: | etc. | adamnemecek wrote: | I hope they add deduplication. I can't count the number of times | when I get 100 pages of results where 95 pages is from the same | included library. | 100k wrote: | (I worked on this.) | | This is on our radar! We de-duplicate exact matches now, but | we'd like to do the same for near-similar documents. | elliottcarlson wrote: | De-duping exact matches is a game changed -- search has been | miserable to use because of the dupes for so long. I can live | with near-similar documents. Very excited to test this out. | colin353 wrote: | Another GitHub Code Search developer here - to add more to | this, we rank all the search results, and try to bring the | most relevant results to the top. Ideally, if you have 10 | pages of results, you shouldn't have to leave page 1 to | find what you're looking for :D | sumtechguy wrote: | That would be a tough problem. As de-dup you probably want to | show/point towards the 'original' tree. But which one is the | source? Or even worse someone abandons a project but someone | else forked it and kept going should it show that one | instead? Or should it show the one it was forked from | depending on the version number. Which one is the 'true' repo | now? Most certainly an interesting problem. | francislavoie wrote: | It really looks like they took a lot of inspiration from | https://sourcegraph.com/search with this. Not a bad thing at all. | I hope SourceGraph doesn't get obsoleted by this though, they're | great people. | junon wrote: | I remember seeing this years ago and thought it was a bit | subpar but it appears they've made strides since then. I might | start using this again. | lancemurdock wrote: | had a pretty awful interview experience there a while back. | Can't say I experienced great people | anandchowdhary wrote: | I interviewed for Sourcegraph and it was one of the best. | Super transparent process, open source handbook, fun coding | tasks -- really nothing to complaint about. Would be curious | to know what made you have such a different experience. | sqs wrote: | Sourcegraph CEO here. I'm really sorry about that. We work | really hard on making our interviews good for everyone, | including documenting it publicly at | https://handbook.sourcegraph.com/talent/interview_process. | Could you please email me at sqs@sourcegraph.com so I could | find out what happened? | mholt wrote: | I'm surprised... I absolutely _loved_ my interview with | Sourcegraph. I kind of wish every tech company interviewed | like they do. | gavinray wrote: | I've met two of their devs randomly in different Discord | servers. Both were great people (Noah, Olaf) and are very | active in OSS communities. Perhaps not coincidentally, both | worked on Language Server related stuff. | | Olafur is responsible for a lot of Scala tooling and some | pretty neat original ideas. | | Sourcegraph also came up with LSIF, which is useful format | for building tooling for language servers: | | https://lsif.dev | | If you want to build this sort of stuff, the work Sourcegraph | has done with LSIF + SemanticDB is probably your easiest bet. | | N=2 isn't great, but there's my experiences if we're tossing | them out there. | sqs wrote: | Sourcegraph CEO here. Imitation is the sincerest form of | flattery. We are very transparent, have a ton of users, and are | open-core, so it's easy to get inspiration from us. :) We want | way more devs to be using code search since it's so valuable | 10x+/day, and if this helps, then we are very happy for that. | Devs get to choose the code search tool they use, so the best | tool will win (you wouldn't use Bing if your boss made | you...likewise, code search isn't like team chat or team docs). | trinovantes wrote: | Still waiting for the ability to search in other branches. It's a | pain when some codebases have stable releases on the next/dev | branch but keep their main branch to the previous release. | namrog84 wrote: | Absolutely. I get they don't want to index every branch but at | least set some heuristics like it it has a certain amount of | activity or something per repo. Or even allow repo to opt into | 1 to 2 other branches besides main. Especially for bigger | projects | | That'd cover 95% of repo I've seen. | jkelleyrtp wrote: | Seems to be that Rust's killer app is burntsushi's mind and | ripgrep. :-) | samueldr wrote: | Only thing missing is indexing of branches and forks. | | My main use case for GitHub search is identifying provenance of | misc. changes in vendor source code tarballs for e.g. Android | kernel releases. It's hard, but sometimes possible to rehydrate | most of the existing commits through cherry-picks and careful | rebases. | | The biggest problem with the lack of indexing branches and forks | is that sometimes vendors makes releases through branches, or | that sometimes repos of interests are forks of e.g. | `torvalds/linux`. | | Hopefully we can see those being indexed in the future. | | I'm also curious: has the plan to drop "less active" repos from | the index gone through? Has anything changed? | alufers wrote: | > I'm also curious: has the plan to drop "less active" repos | from the index gone through? Has anything changed? | | Whaaat? I hope it doesn't go through. I use GitHub code search | for clues when reverse engineering cheap Chinese IoT crap. | Usually I can find some headers / SDKs accidentally uploaded | and set to public by a random Chinese guy. Those repos usually | have one commit and zero traffic, but they contain invaluable | information about proprietary MCUs. | ihnorton wrote: | I would personally like to see less indexing of duplicate | files! There are many things I've searched for which return | 100s of results from independent checkin-uploads of big | libraries like the Android SDK. It would be great if results | were filtered by file similarity regardless of git history (if | that is in fact the issue). | beached_whale wrote: | Got an opportunity to try it a few minutes ago and it's awesome | so far. I was able to look for my code in repos I don't own, e.g | `not org:user foo::bar` | beltsazar wrote: | Does anyone know (or guess) what kind of index they use to | provide regex searches? I'm really curious. | 100k wrote: | We'll be sharing more details soon on the GitHub blog. | Falell wrote: | > Search for an exact string, with support for substring matches | and special characters, or use regular expressions (enclosed in / | separators). | | Finally! | | Search-for-literal is so important when you have technical users | working on non-prose text. | | They say this is going in a dedicated search page 'to start | with', if "<literally any text>" doesn't work in the top bar | eventually this is still going to be miserable. | colin353 wrote: | I'm from the team that developed this at GitHub - if you are in | the technology preview, then you can jump into cs.github.com | from searches done at the top bar. | gavinray wrote: | Thank you | | I use Github's UI for exploring and searching codebases more | often than my own environment, since I do a lot of curious | browsing. | | No offense, but the search is so bad for anything worse than | a single word, that I've developed a sort of intuition for | how to phrase things -- and then still spend a lot of time | crawling pages of results haha. | | This was sorely needed | colin353 wrote: | Couldn't agree more - that's why we built it! Please give | the new search a shot, I think you'll like it :D | mholt wrote: | What's your take on developing a new code search instead of | partnering with an existing global code graph like | Sourcegraph? What are the advantages of GitHub Code Search | over Sourcegraph? | edwinyzh wrote: | Well, in the past I've tried Sourcegraph several times, but | it never give me experiences that match the was-dead-many- | years-ago Google Code Search. I wish the new github code | search does that. | zxienin wrote: | +1 pretty much what was on my mind, seeing this. does this | compete or complement sourcegraph? | bsagdiyev wrote: | Now can they fix doing a language search for "Visual Basic"? If | you filter a users repos or stars on that language it just shows | all their repos or stars. Code search for language "Visual Basic" | returns all repositories and does not limit by language like it | should. | remram wrote: | Meanwhile on GitLab, you can't even search in issue comments | (only the title/description from the author). | john_cogs wrote: | GitLab team member here. | | Comment (and code) search is available for projects in all | GitLab tiers: https://docs.gitlab.com/ee/user/search/#basic- | search | | Premium and Ultimate users have access to Advanced Search: | https://docs.gitlab.com/ee/user/search/advanced_search.html | remram wrote: | There is a way to search for comments using the "global | search", but no way to search for text over issues and their | comments. In particular, no way to search from the issue tab, | no way to search over comments only in issues (or only in | merge requests), no way to combine a text search with | label/milestone/status filters, etc. | | So it's a workaround, but a bad one. | | Here's the ticket (2015): https://gitlab.com/gitlab- | org/gitlab/-/issues/13891. The fact that it has so many | duplicates in your own project's issue tracker is a good | indicator of how bad your issue search is. | boleary-gl wrote: | GitLab team member here. | | > no way to combine a text search with | label/milestone/status filters, etc. | | You can combine text search with field search (like | label/milestone/status. Here's an example: | https://gitlab.com/gitlab- | org/gitlab/-/issues?search=Visuali... | cosentiyes wrote: | The addition of exact match search is so exciting that I haven't | internalized any of the other new features. I've abandoned an | ungodly number of semi-common-word searches after getting 30 | pages of results in a monorepo | philsnow wrote: | I didn't even see this in the feature list before doing the | signup. One of the signup questions is "how do you usually | search?" or so, I wrote in the blank "I want to search for | symbols, not substrings, so if I'm searching for `bar` I don't | want `foo_bar` to show up as a match". I usually do this with | word boundaries in regexes, but I pretty much have to have the | repo downloaded, so it's useless for searching on github.com | this way. | jrochkind1 wrote: | I love how the Microsoft acquisition continues to result in | _increased_ investment in github with microsoft 's resources, and | real vision; not always how an acquisition goes. | adamnemecek wrote: | Microsoft has always been a dev tool company. | einpoklum wrote: | You wouldn't know it looking at MS Visual Studio though. | NmAmDa wrote: | I doubt that before WSL this would be something. I mean | developing on windows was always far lot difficult than Linux | or MacOS. | adamnemecek wrote: | It depends on what you were developing. | johannes1234321 wrote: | Win32 API isn't nice, but Microsoft was always relatively | good with documentation etc. and don't forget all the | developer support within Excel, VBA, Visual basic etc. Bill | Gates early on understood the premise of building a | platform and not breaking it. Even if that meant win32 API | became ugly over time. Old windows programs still work on | newer releases. | swyx wrote: | and a departure of all the key execs | jrochkind1 wrote: | what about it? That's not even a sentence. | mintplant wrote: | If anyone from GitHub is listening, being able to exclude test | code with a few clicks would be an absolute game-changer. By far | the biggest source of noise in my GH code search results, and I | use the tool (and similar tools like Searchfox) super super | heavily. Either way, stoked to try this out. | halayli wrote: | Exactly this. It will also reduce unnecessary requests on their | servers. | Koffiepoeder wrote: | And inversely, searching specifically for test code can also be | useful. For example if searching for an implementation example. | 100k wrote: | Thanks for the feedback! We downrank test files with a | heuristic, though we'll definitely be looking to make this more | sophisticated. You can also exclude results using a regular | expression, like `foo NOT path:/_test\\.go$/`. | dcreager wrote: | And also note that if you often need to add this kind of | qualifier to many searches, you can create a "custom scope" | that includes it for you transparently. | leaded_syrinx wrote: | This is great, specified search on GitHub has previously been | very hit or miss. Generally I use the search feature for learning | / trying to see if something I'm trying to do already exists. I | personally think vsCode has the best code search implementation, | in terms of "exact", "partial" and "regex" matching. The UI is | clear, non-technical team members can navigate their way around | it and it's relatively fast assuming you don't have too many | extraneous plugins installed. | yashap wrote: | Wow, HUGE feature, congrats to the team working on it! GH code | search is a feature with such massive potential utility, but the | old implementation was so weak it was basically useless. Looking | forward to this, will use it constantly if it's good. | AtNightWeCode wrote: | Of all the tools I use on a daily basis Github is probably the | worst. I mean the "Find a repository..." input field on the start | page can not even filter out named repositories I have access to | in all my organizations. It works for some repos but not all. | | Search improvements? It is impossible to create a worse search | experience than Github. Just clone and use git grep instead in | most cases. | | Edit: ...and the 425% price increase for SSO.. | oubliette wrote: | Try constraining your search in Google/DDG with: | site:github.com query | post-it wrote: | Could be worse, could be Reddit search. | | (Granted, this is largely due to a culture of titles like | "Check out this thing" that provide zero searchable metadata + | no tag system.) | v1g1l4nt3 wrote: | No need to clone if you just use | https://sourcegraph.com/search. | svnpenn wrote: | Has the "Last indexed" been fixed? | | whenever I search for code, it will say something like "Last | indexed on Apr 2", but if you go to the actual file, the date | will say 5 years ago or something. So currently the "Last | indexed" listed date is completely useless, and you have to | basically click through to every result. | 100k wrote: | (I worked on that system and the new one.) | | Yes, sadly, that is literally when the file was _indexed_. So | it's not particularly useful. It's a difficult problem to | solve, but I'll bring up your feedback to the team. | Petesta wrote: | Glad to see GitHub's search has improved. I hope GitHub finally | improves the search functionality on gists. You can't search your | own gists by name. | savanpatel wrote: | Why does it matter to speak they built in rust in demo video? It | should not matter to customers. | dvirsky wrote: | Are there any open source powerful code search engines out there? | As a Googler the internal code search we have here is one of the | most incredible things I've ever seen, it's so fast and powerful | I'm amazed by it daily. Is there anything near that quality out | there? | dqv wrote: | Not a Googler, so I can't say. There was Mozilla DXR but it has | been abandoned. | jcranmer wrote: | DXR has largely been replaced with mozsearch | (https://github.com/mozsearch/mozsearch), and a quick glance | through the really early history does show that it adopted a | fair amount of stuff from DXR. The downside is that it's not | as easy to set up a local mozsearch instance as old-school | DXR was. | jcranmer wrote: | I helped write DXR for indexing Mozilla's source code based on | an instrumented compiler run; this has eventually been | developed into mozsearch | (https://github.com/mozsearch/mozsearch), whose indexing for | mozilla-central is visible here: https://searchfox.org. | dqv wrote: | I thought it was abandoned! This is great to hear it just | moved. Is there anyone at Mozilla that can update the old DXR | repo [0] to direct people to MozSearch? | | [0]: https://github.com/mozilla/dxr | jwin742 wrote: | I work on a very large c++ monolith at work and DXR has been | a real game changer for helping me just figure out how so | much of the codebase works. Thanks!! | Falell wrote: | My job uses https://oracle.github.io/opengrok/ and I'm | generally happy with it. It has some problems with special | character searches at times but generally does what I want. | It's certainly better than code search in our on-prem github | instance. | slaymaker1907 wrote: | Yeah, opengrok is great. It is very fast and usually returns | good results. | ibraheemdev wrote: | https://grep.app/ is a great alternative to github's current | search engine. | throwamon wrote: | Would you by any chance be allowed to record a demo screencast? | dti wrote: | You can try it yourself, e.g., the instance the Android team | uses: https://cs.android.com/ | dvirsky wrote: | Oh, I didn't know this existed. The syntax seems to be on | par with the internal one, I couldn't find any info on | what's driving it. | dti wrote: | Also don't know how search works there, but the cross- | reference functionality is powered by an open-source | Kythe project: https://kythe.io/ | toomuchtodo wrote: | If you don't mind me asking, any insight into why it hasn't | been open sourced? | dvirsky wrote: | There is some older version that's open source, I haven't | tried it and I don't know how much of today's code search is | based on it. | | https://github.com/google/codesearch | profquail wrote: | Hoogle is pretty neat -- you can search by type signature and | it'll find matching APIs from hackage packages: | https://hoogle.haskell.org/ | | Source: https://github.com/ndmitchell/hoogle | beliu wrote: | We built Sourcegraph taking inspiration from Google Code Search | (https://about.sourcegraph.com/blog/ex-googler-guide-dev- | tool...) to bring the power of code search--and precise code | intelligence that just works--to every dev. Try it out here: | https://sourcegraph.com. A super common thing we see is people | leaving Google, missing code search, and then bringing | Sourcegraph into their new org. We'd love to hear your | feedback! | beliu wrote: | Sourcegraph is open-core, with a dual licensing approach. You | can run the open-source version here: | https://github.com/sourcegraph/sourcegraph#sourcegraph-oss, | and we have an enterprise offering for companies that want to | adopt for their teams. Similar to GitLab, both our enterprise | and OSS code is publicly available. | Arnavion wrote: | The best thing about the Sourcegraph instance hosted on | sourcegraph.com is that you can edit the URL in your browser | from https://github.com/foo/bar to | https://sourcegraph.com/github.com/foo/bar to be dropped down | into a Sourcegraph search for that GH repo. I've been using | it for a long time because of this convenience. | | (Though it would be even better if the two options for case- | sensitivity and regex search were enabled by default instead | of needing me to toggle them on every time.) | billcaplan wrote: | You should be able to do that over in your User Settings | (Click your picture in the top right and then Settings.) | Adding these two things should change that default for you: | "search.defaultCaseSensitive": true, | "search.defaultPatternType": "regexp", | | Also see: | https://docs.sourcegraph.com/admin/config/settings#search- | de... | Arnavion wrote: | I don't have a user account (nor do I want to make one). | axiosgunnar wrote: | Are you worried this new Github Code Search might steal all | your users? | murat124 wrote: | Not sure if it's good enough to replace https://grep.app/ | 100k wrote: | (I worked on this.) | | Give it a shot and let us know what you think! Where can we | improve it? | beltsazar wrote: | What kind of indexes do you use to provide regex searches? | johndough wrote: | Today I wanted to search for "strstr[a-z]+?_r" but got the | error message "This is a partial result set. The search was | stopped early because it would take too long to check every | file for this regular expression.". However, I got results | for the less restrictive regex "strstr.+?_r" which is weird | since I'd expect that it would be easier to return results | for more restrictive regular expressions. Not sure if there | is a perfect solution for this, but in many cases, you could | probably search for the less restrictive version and filter | the results with the more restrictive one after that. | | Also it would be great if more repositories were indexed. How | do things work behind the scenes? Maybe it is possible to | build a more memory-efficient index just for exact string | search, which probably make up most searches. | | Anyway, this website is amazing and I use it quite often. | Thank you a lot for working on this! | 100k wrote: | Thanks for the feedback, we're working on some changes to | improve regular expression performance. | | We're also working hard to increase the number of | repositories indexed. :) | jayflux wrote: | I think that app triggered the inspiration to do this. So I | would think what they deliver will be similar or have some | feature parity. | deft wrote: | I always thought the search was purposely bad and overly limited | to prevent scraping for credentials. | tyingq wrote: | Ah, great. GitHub throwing out special characters in searches was | infuriating for languages with sigils and patterns, like $somevar | or %sql% and so on. | oezi wrote: | Any idea how to get further ahead on the waitlist for co-pilot? | [deleted] | jimsimmons wrote: | Slight tangent: The video has a guy describing the tool and he | includes the fact that it's written in rust when introducing it. | I've always found this sort of name dropping in rust | projects/devs baffling. Is there anything that I'm expected to | infer from it? Is it that it's backend is memory safe? I can't | think of anything else. Now it may very well be very memory safe | but why include that highly specific detail when talking about a | very high level thing that is the UX of search. What if it was | written in Haskell or C#? Would it still be brought up? It's | almost as if being written in rust is a feature in itself these | days. As a technical guy I can't help but take the person less | seriously, especially when it's as unwarranted as this. | qaq wrote: | It's obviously personal preference but as a technical guy I am | always curious what lang. a project is using. | nindalf wrote: | He's talking about text search and the post thanks @BurntSushi. | That means they're using the fastest text search tool out there | - ripgrep. I won't mention what it's written in, because that | clearly upsets you. | | Benchmark - ripgrep is faster than {grep, ag, git grep, ucg, | pt, sift} (2016) - https://blog.burntsushi.net/ripgrep/ | t3rabytes wrote: | Go had this issue for a while, too, it's finally started to | calm down as Go hits a mainstream that is (imo) much farther | than Rust is currently. I think much is just people trying to | add validity to Rust for large-scale production workloads, in | the same way that Kubernetes was "a compute scheduler written | in Go" or Terraform was "infrastructure as code written in Go" | (maybe those are bad examples, but I know I've seen the "X | written in Go" thing going on). | gscho wrote: | This is exactly how I see it as well. Rust used to be an | obscure language with a compiler written in OCAML. If | something was written in D or zig, it's noteworthy so you | mention it. I think rust has come into the mainstream enough | that we can drop the "written in rust" line imo. | eyelidlessness wrote: | I think depending on where the audience is coming from--for | example people who primarily work in scripting/interpreted | languages--Rust can also be a positive signal for performance. | colin353 wrote: | Hey! That was me in the video. | | Not ashamed to be a Rust evangelist! The reason I mentioned | Rust is because we spent a lot of time making the experience | really fast - which is super important for a product like this. | I really think getting the performance we have would have been | enormously more difficult in any other language. | Dowwie wrote: | Fellow Rustacean here. Is the search engine secret sauce or | something that could perhaps be open sourced? I'd like better | tooling for searching private code bases. Also, would you | consider writing about optimization techniques you used? | colin353 wrote: | We are looking into open sourcing some libraries that we've | developed for search. And we're going to write a blog post | with way more technical details soon! | aaaaaaaaaaab wrote: | It really is like the joke about vegans. So tiring. | isaacimagine wrote: | I agree with you, but I just wanted to point out the following: | | In general, Rust, C, and C++ are going to be faster than | languages like Ruby*. He brought up Rust while discussing the | performance of the new tool. Although performance is more | complex than language choice, etc., saying it's written in Rust | gives the viewer an approximate lower bound as to how fast the | tool should be. | | *: (GH started as a Ruby shop, so I wouldn't be surprised if | that's what the original tool was written in). | ju-st wrote: | Is there any good reason why the search doesn't find file names? | Or does it now with the new search? | colin353 wrote: | The new search does find filenames! :D | nerdkid93 wrote: | I wonder if it is the followup to this conversation from last | year when https://grep.app was released: | https://news.ycombinator.com/item?id=22397728 | judge2020 wrote: | Probably not, they've been looking at/working on improved Code | Search since 2019: https://youtu.be/9EoNqyxtSRM?t=1726 | l0b0 wrote: | Now, can we please get GitHub issues back into third party search | engines? Now, whenever I search for something I _know_ is in an | issue I only ever get results from those crappy GitHub scraper | sites. This is happening on both Google and DuckDuckGo. | valtism wrote: | I don't think Github has any control over this without changing | their content license. | patrickdevivo wrote: | this looks awesome! two things I've always wanted and haven't | found satisfying solutions for in code search (in an editor) | | 1) an ability to easily express higher level concepts in a search | that's aware of code semantics ("match only function names", | "find call sites of a method") etc. Maybe this is possible with | existing tools (probably is?) but I tend to get lazy about | learning DSLs - would love to see this in a UI if it's possible | | 2) ability to save searches I do frequently - after a certain | level of complexity in a query (I've added ignore rules, I | crafted the right regex, etc), I want to be able to save the | "context" of a search so that I can easily return to it later | colin353 wrote: | GitHub Code Search developer here: | | > would love to see this in a UI if it's possible | | We do have code navigation via the UI, so in a way it's | possible! | | > ability to save searches I do frequently | | Absolutely! This is possible using "custom scopes". If you're | in the technology preview, click on the scope dropdown, scroll | to the bottom, and choose "custom scopes". You can make a | custom scope to search a set of respositories, a particular | language, within a directory, or any combination with boolean | operators! | pianoben wrote: | I've been in the preview for a bit. | | 1) This doesn't seem to exist in quite that way, but you can | prefix a literal with "def:" and the engine will return only | definitions of that thing (so far as it can tell). It's not | quite what you (or I!) want, but close. | | 2) This exists and is called "scopes". On the landing page, to | the left of the search bar, click the grey pill that says "All | repos". At the bottom there is a "custom scopes" option. | colin353 wrote: | Might also be worth checking out the syntax guide: | https://cs.github.com/about/syntax#symbol | jjwiseman wrote: | It's local-only search, but you reminded me that this is | possible with MacOS Spotlight. I wrote an indexer (for Common | Lisp) that let you search for function definitions, etc. | | http://lemonodor.com/archives/001232.html | | For example, if you're looking for a search-and-replace | function you know you wrote or had somewhere on your machine, | you could do mdfind "org_lisp_defuns == | '*search*replace*'" | | (Or just use the regular Spotlight UI.) | beached_whale wrote: | I just want to say about time. A lot of the time when using | libraries with inadequate documentation, being able to find | usages of a method or class gives really good insight into the | library. But the current code search's stemming removes all the | context needed to find that and then gives alternate spellings | too. | questiondev wrote: | i was actually really surprised that this did not exist when i | went to search github for the first time. you would think that an | open source giant would have this ability but i guess there is a | ton of computational load to achieve search in general. i'll | probably get downvoted for bringing up a whacky idea, but imagine | having some type of referencing system that is done through multi | node p2p, so searching certain systems using shared resources. i | guess the major problem would be if devs would actually spare | some of their personal computational resources to help the | community find things and not rely on special interest groups. i | get it, i am old school as well. i started out on pascal and | BASIC. but still think using creative solutions is fun. but you | know, napster was cool back in the day prior to their lawsuits. | and p2p was starting to pick up speed | throwamon wrote: | There was a recent post on search engines where I believe a P2P | solution was mentioned (but maybe it was on some related post | within a few days of this one): | https://news.ycombinator.com/item?id=29417061 | ryanseys wrote: | I'd love some shorter keywords here for searching so this was | quickly composable into something useful. | | E.g. | | p: or f: instead of path: for filenames | | l: instead of language: | | -f: to exclude specific filenames (makes it easy to filter out | tests) | | You get the idea. | oever wrote: | Code search on GitHub is only available to people that log in | with Microsoft. Clicking on 'Code' redirects to the login page. | | It is not a friendly site. Open source projects would do better | to use an open source code forge like <https://sr.ht/>. | v1g1l4nt3 wrote: | Meanwhile... https://sourcegraph.com/search | W0lf wrote: | Great. I'm using grep.app[1] usually as for me the GitHub search | is mostly useless. Your mileage may vary though. That being said | there are many other great search interfaces that I am using | often when I'm trying to find solutions to common problems or | specific design patterns. Chromium search[2] comes to mind, | Mozilla's Firefox[3], Android[4] or of course Google[5] | | [1] https://grep.app/ | | [2] https://cs.chromium.org/ | | [3] https://dxr.mozilla.org/mozilla-central/source/ | | [4] https://cs.android.com/ | | [5] https://cs.opensource.google/ | v1g1l4nt3 wrote: | Sourcegraph[6] | | [6] https://about.sourcegraph.com | majso wrote: | This is great! As a project manager I am using github search | everyday when I am searching for specific methods or part of the | code in order to find logical issues or bugs in a code. ___________________________________________________________________ (page generated 2021-12-08 23:00 UTC)