[HN Gopher] Pandas Illustrated: Visual Guide to Pandas
       ___________________________________________________________________
        
       Pandas Illustrated: Visual Guide to Pandas
        
       Author : nemoniac
       Score  : 109 points
       Date   : 2023-01-27 19:41 UTC (3 hours ago)
        
 (HTM) web link (scribe.citizen4.eu)
 (TXT) w3m dump (scribe.citizen4.eu)
        
       | axi1 wrote:
       | The proper (free) link is https://betterprogramming.pub/pandas-
       | illustrated-the-definit...
        
       | jcq3 wrote:
       | Yet another pandas tutorial. Got chatgpt now, thx.
        
         | r2_pilot wrote:
         | Good luck with plausible hallucinated interfaces in your
         | statistically-generated responses.
        
       | timdellinger wrote:
       | This seems to be getting the Hug of Death, but this looks like
       | the content:
       | 
       | https://betterprogramming.pub/pandas-illustrated-the-definit...
        
       | neonate wrote:
       | https://web.archive.org/web/20230127194856/https://scribe.ci...
        
       | dark-star wrote:
       | These are not the Pandas I was looking for _waves hand_
        
       | matsemann wrote:
       | Can recommend taking a look at Polars. Kinda a successor to
       | pandas.
       | 
       | https://www.pola.rs/
        
         | z3c0 wrote:
         | Interesting. Seems to also take quite a few leaves from
         | PySpark's book.
        
         | 89vision wrote:
         | Neat. I love that there's a rust implementation. Types make
         | everything better
        
       | throwaway_75369 wrote:
       | So, given the title and how stressful the last couple of weeks
       | have been, I was sadly disappointed when this wasn't about
       | drawing cute black and white bears.
       | 
       | I mean, data analysis is useful and all, but not what the heart
       | wanted at the moment.
        
         | [deleted]
        
         | 867-5309 wrote:
         | asking DALL-E for some Python Pandas might relieve our
         | disappointment
        
       | [deleted]
        
         | [deleted]
        
       | irrational wrote:
       | LOL. Those were not the kind of pandas I was expecting.
       | 
       | One of my daughters is a panda bear fanatic and I thought this
       | would be a resource I could share with her.
        
         | tomcam wrote:
         | Same! Although at first glance it appears to be an excellent
         | example of clear, well-illustrated documentation.
        
       | oneoff786 wrote:
       | I do almost all of my day job in pandas. I consider myself very
       | good at it. My number one recommendation to new data scientists
       | learning the ropes is to just not use NumPy almost at all. I'm
       | not sure where people learn it but they do all of this
       | complicated nonsense. Just map simple Python lambda funcs with
       | pd.Series.map and that's most of what you need. Memorize your
       | pd.DataFrame methods.
       | 
       | If your code feels like it dealing with a matrix and not a table,
       | it's probably doing something funny.
        
         | boppo1 wrote:
         | What is your day job?
        
         | ajoseps wrote:
         | I think it really depends on the scale of data. If you're
         | dealing with anything less than a GB, it probably doesn't
         | matter all that much, but once you're dealing with larger
         | datasets there is a pretty massive difference with using
         | vectorized operation. Some of the pandas dataframes methods map
         | to underlying numpy ones, but I don't believe that is always
         | the case
        
         | _Wintermute wrote:
         | You lose a lot of performance not using vectorised functions.
         | Maybe not an issue if you're only dealing with small amounts of
         | data.
        
           | oneoff786 wrote:
           | Series.map is vectorized.
           | 
           | Pretty much everything you need in pandas is as performant as
           | you ought to need for doing tabular data manipulation in
           | Python. Except dataframe.apply
        
             | _Wintermute wrote:
             | It is not.                   df = pd.DataFrame({"foo":
             | np.random.randn(100000)})
             | 
             | pandas map:                   df["foo"].map(lambda x: x *
             | 2)
             | 
             | 18.1 ms +- 109 us per loop (mean +- std. dev. of 7 runs,
             | 100 loops each)
             | 
             | pandas apply:                   df["foo"].apply(lambda x: x
             | * 2)
             | 
             | 17.9 ms +- 46.6 us per loop (mean +- std. dev. of 7 runs,
             | 100 loops each)
             | 
             | Vectorised function, using underlying numpy operations:
             | df["foo"] * 2
             | 
             | 267 us +- 11.8 us per loop (mean +- std. dev. of 7 runs,
             | 1000 loops each)
        
             | lcvriend wrote:
             | If by "vectorized" you mean: "able to delegate the task of
             | performing mathematical operations on the array's contents
             | to optimized, compiled C code." then I do not think you are
             | correct (unless perhaps you are supplying map with a dict
             | or Series).
             | 
             | Series.map is not compiling your lambda's to C and running
             | it. If there is a built-in method available it usually will
             | be faster. Notable exception are pandas str methods which
             | devolve into Python code but generally with more overhead
             | than map/apply.
        
           | voxelghost wrote:
           | Check out polars.
           | 
           | Vectorized, choice between lazy optimization and eager.
        
       | rbanffy wrote:
       | @dang can you replace the link with the original?
       | https://betterprogramming.pub/pandas-illustrated-the-definit...
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-01-27 23:00 UTC)