hngopher.com

       [HN Gopher] Show HN: Use cookies from Chrome (CDP) in cURL witho...
       ___________________________________________________________________
        
       Show HN: Use cookies from Chrome (CDP) in cURL without copy pasting
        
       Author : fipso
       Score  : 125 points
       Date   : 2023-04-01 11:03 UTC (11 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | juujian wrote:
       | I have done a lot of scraping in the past. Cookies are a pain,
       | this is a really elegant solution. Of course the biggest problem
       | is that everything interesting is hidden away behind JavaScript
       | these days and then you have to resort to Selenium and the whole
       | thing just spirals out of control. But I'm looking forward to
       | giving this a shot for non-JavaScript content in the future.
       | 
       | edit: JavaScript not Java
        
         | [deleted]
        
         | mkl wrote:
         | Do you mean JavaScript? I have never run into content hidden by
         | Java, but many pages load content dynamically using JavaScript.
         | 
         | I have found it's quite easy to snoop on those JavaScript API
         | requests using the Network tab of Chrome Devtools, then copy
         | the network request as a curl command for bash scripts or as
         | JavaScript for browser extensions.
        
           | tomashubelbauer wrote:
           | > I have never run into content hidden by Java
           | 
           | Tongue in cheek: You'd never know - servers running Java code
           | generating HTML pages have probably conditionally not-
           | rendered many pieces of HTML that you've never come across in
           | your browsing :)
        
           | ghqst wrote:
           | Yeah, you can sometimes find the API or find data sent in
           | JavaScript but not in prerendered HTML, which can save you
           | the pain of headless scraping.
        
           | juujian wrote:
           | I do mean JavaScript. Not sure how many times I have made
           | that mistake... And great advice, that sounds like a neat
           | approach.
        
         | bdcravens wrote:
         | If you'd be standing up CDP to grab the cookies, you'd probably
         | use Puppeteer or Playwright instead of Selenium.
        
           | juujian wrote:
           | Appreciate the recommendation, I just used whatever python
           | had to offer, Puppeteer looks promising though!
        
             | bdcravens wrote:
             | Using the tools at hand is often the best approach. That
             | said, I've spent most of the last 13 years of my career
             | automating browsers. For years, I used Selenium with a
             | variety of libraries. After switching to
             | Puppeteer/Playwright, I have zero interest in going back
             | lol. Playwright actually has first party Python support.
             | (Puppeteer has a port called Pyppeteer, but it's no longer
             | maintained and the author recommends using Playwright)
             | 
             | https://playwright.dev/python/
        
               | rgrieselhuber wrote:
               | I second Playwright, it's amazing.
        
               | robertlagrant wrote:
               | Third.
        
         | berkle4455 wrote:
         | Javascript is delivered as text and sends text-based HTTP calls
         | to the server to fetch more data. Why do you need selenium?
        
           | LelouBil wrote:
           | if you don't want to reverse engineer the javascript
        
           | rhd wrote:
           | I've once used Selenium to run javascript in the webpage to
           | steal a few dynamic tokens required by the sites API to reuse
           | in my more well-trodden python-requests workflow.
        
         | totetsu wrote:
         | There are python libraries you can use that import cookies
         | directly from wherever your browsers stores them to use in
         | selenium projects.
        
       | cookiengineer wrote:
       | I've had kind of the same problem in the past. For me I built a
       | cookiejar textfile generating chrome extension, because it turns
       | out most relevant tracking or session cookies are on external
       | domains or oauth provider domains. [1]
       | 
       | You just need to copy/paste the generated text content to a
       | cookies.txt and you're set, so it worked for my workflow in the
       | terminal.
       | 
       | [1] https://github.com/cookiengineer/me-want-cookies
        
       | 2h wrote:
       | > Tired of copy pasting cURL commands from chrome to your
       | terminal ?
       | 
       | FYI for anyone that does this, MITM Proxy is usually a better
       | option for this type of stuff. Not sure about Chrome, but
       | especially with Firefox, you have no way of getting the full raw
       | request on anything with a request body like POST. You have to
       | Copy Request Headers, then Copy POST Data. With MITM Proxy or
       | similar you can just get the full request at once. Also you can
       | inject headers like X-Forwarded-For into all or specific
       | requests.
        
         | folmar wrote:
         | > you have no way of getting the full raw request on anything
         | with a request body like POST
         | 
         | On FF right click on Request -> Copy Value -> As cURL This
         | gives everything and works with POST since a few years at
         | least.
        
       | thrdbndndn wrote:
       | Feel like you can just read Chrome's cookie from the file (and
       | filter out the ones you need by site, of course) so you don't
       | need to bother run chrome in debugging mode?
       | 
       | Like https://github.com/borisbabic/browser_cookie3
        
         | toomuchtodo wrote:
         | yt-dlp does this also.
         | 
         | https://news.ycombinator.com/item?id=28320666
        
           | thrdbndndn wrote:
           | Thanks for the link. I know yt-dlp does, but from your link I
           | found another library
           | (https://github.com/n8henrie/pycookiecheat) that can do that
           | and it seems more popular than browser_cookie3.
           | (browser_cookie3 works totally fine last time I tried).
        
         | fipso wrote:
         | This is awesome. I did not know decrypting chrome's password db
         | is still that easy.
        
           | paulirish wrote:
           | Cookies != Passwords..
           | 
           | But anyway... You know this is also easily accessible within
           | DevTools, yah? https://umaar.com/dev-tips/3-copy-as-curl/
        
             | eurasiantiger wrote:
             | One could argue that cookies need to be more securely
             | stored than passwords, because they can allow an attacker
             | to bypass passwords and all other authentication factors.
        
       ___________________________________________________________________
       (page generated 2023-04-01 23:00 UTC)