[HN Gopher] Show HN: Use cookies from Chrome (CDP) in cURL witho... ___________________________________________________________________ Show HN: Use cookies from Chrome (CDP) in cURL without copy pasting Author : fipso Score : 125 points Date : 2023-04-01 11:03 UTC (11 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | juujian wrote: | I have done a lot of scraping in the past. Cookies are a pain, | this is a really elegant solution. Of course the biggest problem | is that everything interesting is hidden away behind JavaScript | these days and then you have to resort to Selenium and the whole | thing just spirals out of control. But I'm looking forward to | giving this a shot for non-JavaScript content in the future. | | edit: JavaScript not Java | [deleted] | mkl wrote: | Do you mean JavaScript? I have never run into content hidden by | Java, but many pages load content dynamically using JavaScript. | | I have found it's quite easy to snoop on those JavaScript API | requests using the Network tab of Chrome Devtools, then copy | the network request as a curl command for bash scripts or as | JavaScript for browser extensions. | tomashubelbauer wrote: | > I have never run into content hidden by Java | | Tongue in cheek: You'd never know - servers running Java code | generating HTML pages have probably conditionally not- | rendered many pieces of HTML that you've never come across in | your browsing :) | ghqst wrote: | Yeah, you can sometimes find the API or find data sent in | JavaScript but not in prerendered HTML, which can save you | the pain of headless scraping. | juujian wrote: | I do mean JavaScript. Not sure how many times I have made | that mistake... And great advice, that sounds like a neat | approach. | bdcravens wrote: | If you'd be standing up CDP to grab the cookies, you'd probably | use Puppeteer or Playwright instead of Selenium. | juujian wrote: | Appreciate the recommendation, I just used whatever python | had to offer, Puppeteer looks promising though! | bdcravens wrote: | Using the tools at hand is often the best approach. That | said, I've spent most of the last 13 years of my career | automating browsers. For years, I used Selenium with a | variety of libraries. After switching to | Puppeteer/Playwright, I have zero interest in going back | lol. Playwright actually has first party Python support. | (Puppeteer has a port called Pyppeteer, but it's no longer | maintained and the author recommends using Playwright) | | https://playwright.dev/python/ | rgrieselhuber wrote: | I second Playwright, it's amazing. | robertlagrant wrote: | Third. | berkle4455 wrote: | Javascript is delivered as text and sends text-based HTTP calls | to the server to fetch more data. Why do you need selenium? | LelouBil wrote: | if you don't want to reverse engineer the javascript | rhd wrote: | I've once used Selenium to run javascript in the webpage to | steal a few dynamic tokens required by the sites API to reuse | in my more well-trodden python-requests workflow. | totetsu wrote: | There are python libraries you can use that import cookies | directly from wherever your browsers stores them to use in | selenium projects. | cookiengineer wrote: | I've had kind of the same problem in the past. For me I built a | cookiejar textfile generating chrome extension, because it turns | out most relevant tracking or session cookies are on external | domains or oauth provider domains. [1] | | You just need to copy/paste the generated text content to a | cookies.txt and you're set, so it worked for my workflow in the | terminal. | | [1] https://github.com/cookiengineer/me-want-cookies | 2h wrote: | > Tired of copy pasting cURL commands from chrome to your | terminal ? | | FYI for anyone that does this, MITM Proxy is usually a better | option for this type of stuff. Not sure about Chrome, but | especially with Firefox, you have no way of getting the full raw | request on anything with a request body like POST. You have to | Copy Request Headers, then Copy POST Data. With MITM Proxy or | similar you can just get the full request at once. Also you can | inject headers like X-Forwarded-For into all or specific | requests. | folmar wrote: | > you have no way of getting the full raw request on anything | with a request body like POST | | On FF right click on Request -> Copy Value -> As cURL This | gives everything and works with POST since a few years at | least. | thrdbndndn wrote: | Feel like you can just read Chrome's cookie from the file (and | filter out the ones you need by site, of course) so you don't | need to bother run chrome in debugging mode? | | Like https://github.com/borisbabic/browser_cookie3 | toomuchtodo wrote: | yt-dlp does this also. | | https://news.ycombinator.com/item?id=28320666 | thrdbndndn wrote: | Thanks for the link. I know yt-dlp does, but from your link I | found another library | (https://github.com/n8henrie/pycookiecheat) that can do that | and it seems more popular than browser_cookie3. | (browser_cookie3 works totally fine last time I tried). | fipso wrote: | This is awesome. I did not know decrypting chrome's password db | is still that easy. | paulirish wrote: | Cookies != Passwords.. | | But anyway... You know this is also easily accessible within | DevTools, yah? https://umaar.com/dev-tips/3-copy-as-curl/ | eurasiantiger wrote: | One could argue that cookies need to be more securely | stored than passwords, because they can allow an attacker | to bypass passwords and all other authentication factors. ___________________________________________________________________ (page generated 2023-04-01 23:00 UTC)