hngopher.com

       [HN Gopher] Computer scientists prove why bigger neural networks...
       ___________________________________________________________________
        
       Computer scientists prove why bigger neural networks do better
        
       Author : theafh
       Score  : 165 points
       Date   : 2022-02-10 16:02 UTC (6 hours ago)
        
 (HTM) web link (www.quantamagazine.org)
 (TXT) w3m dump (www.quantamagazine.org)
        
       | SpaceManNabs wrote:
       | > Right now, we are routinely creating neural networks that have
       | a number of parameters more than the number of training samples.
       | This says that the books have to be rewritten.
       | 
       | Confused by this statement. Double descent with
       | overparameterization is exhibited in "classical settings" too and
       | mentioned in older books.
       | 
       | > In their new proof, the pair show that overparameterization is
       | necessary for a network to be robust.
       | 
       | What is important to note here is that many of papers this paper
       | cites prove or show this result in certain network architectures.
       | This paper adds universality.
       | 
       | > The proof is very elementary -- no heavy math, and it says
       | something very general
       | 
       | The most elementary part was clever use of Hoeffding's
       | inequality. Some people are really fast readers haha.
       | 
       | I don't even know how you pick up the fact that isoperimetry
       | holds in manifold settings with positive curvature while also
       | playing with all those norms and inequalities. A few years ago I
       | mentioned on here all the maths that I knew or wanted to know to
       | read more papers, and others critiqued that the list was too
       | long. Well, this is why!
        
         | aray wrote:
         | > Double descent with overparameterization is exhibited in
         | "classical settings" too and mentioned in older books.
         | 
         | I'm curious for references or citations to this. When I was
         | going over double descent I tried to find citations like this
         | (just in a couple places like ML/stats textbooks).
        
           | moyix wrote:
           | Here's one that lists some older references:
           | https://arxiv.org/abs/2004.04328
        
           | tomrod wrote:
           | There are a handful of papers in the 90s that show this, but
           | it wasn't recognized for what it is. Double descent is REALLY
           | crazy to me, coming from a classical background.
        
             | pishpash wrote:
             | Over-parameterization for regularization is really old. The
             | pseudoinverse min-norm solution for under-determined linear
             | systems even has that flavor.
        
               | tomrod wrote:
               | Sure, but that's identification approaches in
               | econometrics and matrix analysis contexts. Using that for
               | neural networks is new-ish in the zeitgeist, which did
               | not exist in the 1990s as it does today.
        
         | throwmeawaysoon wrote:
        
       | feketegy wrote:
       | This looks interesting, I bookmarked it.
       | 
       | My biggest blocker is the "statistics" part of M/L, knowing what
       | algorithms to choose for various cases.
        
         | qorrect wrote:
         | This book was a big help for me and is very well written,
         | https://xcelab.net/rm/statistical-rethinking/ . You can find it
         | free online ( along with video course ). The printed version is
         | a very nice high quality book.
        
           | stevofolife wrote:
           | Do you know if there are any online classes that use this
           | book as a reference? Or more generally, what type of courses
           | teaches this subject?
        
           | lariati wrote:
           | Thanks so much. That is an amazing level of choice in the
           | example code. I need this right now as a type of statistical
           | strength training.
        
         | kache_ wrote:
         | check out introduction to statistical learning
        
       | [deleted]
        
       | stared wrote:
       | I am surprised that the paper does not even cite the Lottery
       | Ticket Hypothesis (https://arxiv.org/abs/1803.03635,
       | https://eng.uber.com/deconstructing-lottery-tickets/).
       | 
       | In the LTH paper (IMHO the most fundamental deep learning
       | publication in the last few years), the number of tickets goes as
       | layer_size^n_layers.
        
         | gwern wrote:
         | I don't see how lottery tickets yield the isoperimetry result,
         | even in a heuristic or handwavy sort of way. Yes, a larger
         | network is more likely to have good-scoring subnetworks; sure.
         | But that's all it says. What does that tell me about how
         | efficiently I can construct an adversarial example? For that, I
         | need something else, like, say a geometric argument about what
         | sort of network will interpolate between high-dimensional
         | datapoints with properties like "not changing much in response
         | to small input changes"...
        
         | renewiltord wrote:
         | Considering the subject, it is at least somewhat amusing that
         | you double posted this.
        
       | samwisedum wrote:
       | Let's add more nodes so we can overfit even better!
        
       | ogogmad wrote:
       | Does this help against adversarial examples? The article seems to
       | suggest so.
        
       | prideout wrote:
       | > The proof relies on a curious fact about high-dimensional
       | geometry, which is that randomly distributed points placed on the
       | surface of a sphere are almost all a full diameter away from each
       | other.
       | 
       | What theorem is this referring to? Sounds like something I should
       | already be familiar with, but I'm not.
        
         | grungegun wrote:
         | For reference, see the book High Dimensional Probability by
         | Vershynin. It's free online. See Theorem 3.1.1. It proves that
         | a sub-gaussian random vector is in some sense close in norm to
         | sqrt(n) where n is the number of dimensions. Most of these
         | results are true up to multiplying by some unknown constant.
        
         | [deleted]
        
         | [deleted]
        
         | aix1 wrote:
         | Not my area of expertise, but the quoted "fact" seems at best
         | incompletely stated: surely for it to hold there must be some
         | constraints on the number of points (likely as a function of
         | the diameter)?
        
           | pfortuny wrote:
           | It is for VERY HUGE n, as siblings explain.
        
           | Retric wrote:
           | It's just wrong as stated, there is only one point a full
           | diameter away from each point on a high dimensional sphere.
           | Aka (1,0,0,0,0, ...) maps to (-1,0,0,0,0, ...) and nothing
           | else. Just as (1,0) maps to (-1,0) on a unit circle and
           | (1,0,0) maps to (-1,0,0) on a unit sphere.
           | 
           | On a high dimensional sphere they should generally be close
           | to square root of 2 radius away from each other.
        
             | hedora wrote:
             | If the data points are in the space [0,1]^n, and your
             | metric function is:
             | 
             | d(x,y) = 0 if x == y; 1 otherwise
             | 
             | Then all points are distance one apart. It's been proven
             | that, as dimensionality increases, normal euclidian
             | distance over uniform point clouds rapidly converges to
             | have the same behavior as the equality metric.
             | 
             | The proof relies on the information gained by performing
             | pairwise distance calculations.
             | 
             | In the example distance function I gave, there is zero
             | information gained if you plug in two points that are known
             | to be non-equal.
             | 
             | The information gained from evaluating the Euclidian
             | distance function converges to zero as the dimensionality
             | of the data set increases.
             | 
             | (Note: This does not hold for low dimensional data that's
             | been embedded in a higher dimensional space.)
             | 
             | Edit: Misread your comment. Yes, everything ends up being
             | the same distance apart. More precisely, the ratio of mean
             | distance / stddev distance tends to infinity. The intrinsic
             | dimensionality of the data is monotonic w.r.t. that ratio.
        
             | bick_nyers wrote:
             | Euclidean distance calculations change based on number of
             | dimensions, for example, in 3 dimensions it is
             | sqrt(a^2+b^2+c^2).
        
               | Retric wrote:
               | Yes, that's why it's square root of 2. Consider the
               | origin (0,0,0, ...) to a random point on the sphere (~0,
               | ~0, ~0, ...).
               | 
               | Distance = square root of ((X1 - X2) ^ 2 + (Y1 - Y2) ^2 +
               | ...). So D = square root of ((~0-0)^2 + (~0-0)^2 +
               | (~0-0)^2 + ... ), which is equal to 1 by definition of
               | the unit high dimensional sphere.
               | 
               | So distance from (1,0,0,0 ...) to (~0, ~0, ~0, ...) =
               | square root of ((~0-1)^2 + (~0-0)^2 + (~0-0)^2 + ... ) ~=
               | square root of 2.
        
               | bick_nyers wrote:
               | Ahh ok, for some reason I was thinking (1,1,1) would be a
               | valid point in this case
        
             | dan-robertson wrote:
             | The fact should say that the expected distance between two
             | random points tends to the diameter as the dimension
             | increases. The intuition is that to be close you need to be
             | close in a large number of coordinates and the law of large
             | numbers (though coordinates aren't independent) suggests
             | that is unlikely. If you fix one point on a sphere (say
             | (1,0,...,0)) then, for a high dimension, most points will
             | not have any extreme values in coordinates and will look
             | like (~0,~0,...,~0) where ~0 means something close to zero.
             | But if we sum the squares of everything apart from the
             | first we get 1 - (~0)^2 ~= 1, so the distance from our
             | fixed point is (1 - ~0)^2 + sum_2^n (0 - ~0)^2 ~= 1 + 1 =
             | 2.
        
               | Retric wrote:
               | You forgot the square root on distance formula. Distance
               | = square root of ((X1 - X2) ^ 2 + (Y1 - Y2) ^2 + ...).
               | 
               | Consider the origin (0,0,0, ...) to a random point on the
               | sphere (~0, ~0, ~0, ...). So Distance from origin =
               | square root of ((~0-0)^2 + (~0-0)^2 + (~0-0)^2 + ... ),
               | which sums to 1 by definition of the unit high
               | dimensional sphere.
               | 
               | Then plug in 1 vs 0 in the first place because we care
               | about (1,0,0,0 ...) and you get the correct answer =
               | square root of ((~0-1)^2 + (~0-0)^2 + (~0-0)^2 + ... ) ~=
               | square root of 2.
               | 
               | Edited to fix typo and add clarity.
        
               | dan-robertson wrote:
               | Wow. Can't believe I missed that.
        
           | ravi-delia wrote:
           | It should be almost all points are _almost_ a full diameter
           | away. However it 's still very striking, and an unintuitive
           | fact about very high dimensional spheres.
        
         | leto_ii wrote:
         | I think it's something related to the curse of dimensionality
         | [1] [2], basically just a property of high dimensional spaces
         | (perhaps only certain kinds of spaces though).
         | 
         | [1] https://en.wikipedia.org/wiki/Curse_of_dimensionality
         | 
         | [2] http://kops.uni-
         | konstanz.de/bitstream/handle/123456789/5715/...
        
           | hedora wrote:
           | The intrinsic dimensionality of a dataset is also relevant
           | here.
           | 
           | The M-Tree is one of my favorite indexes. It works with data
           | that's embedded in infinite dimensional spaces (sometimes;
           | it's bumping up against an impossibility result that's
           | sketched in a sibling comment).
        
           | bo1024 wrote:
           | Yes.
           | 
           | Even though almost every all pairs of points are almost a
           | full diameter away from each other, they are also almost all
           | almost orthogonal (i.e. the angle they make with the center
           | of the sphere is very close to 90 degrees).
        
         | bick_nyers wrote:
         | My initial intuition is telling me that it would be diameter/2,
         | from the perspective of a single point, the closest points
         | would be near zero distance away, and the furthest points would
         | be on the opposite side, a full diameter away, and I am
         | assuming that there are a lot of points in a uniform
         | distribution.
         | 
         | What I have just thought about though, is what points would be
         | exactly diameter/2 distance away from that point? If you have a
         | circle, you might think it would be the points that form a 90
         | degree triangle, but that is not the case, those points would
         | be sqrt(2)*radius distance away.
         | 
         | So while it is obvious to me that it is not diameter/2, it is
         | not obvious to me why it would be diameter either, or how
         | larger n converges it closer to the diameter or some other
         | fixed number.
        
           | dan-robertson wrote:
           | If you consider a point on the sphere it means choosing a
           | bunch of xi such that:                 x1^2 + x2^2 + ... +
           | xn^2 = 1.
           | 
           | Suppose wlog you pick (1,0,0,...,0). Then the distance from
           | your point to a random point is:                 D = (x1-1)^2
           | + x2^2 + ... + xn^2
           | 
           | And from the first equation we know:                 x1^2 = 1
           | - x2^2 - x3^2 - ... - xn^2
           | 
           | Intuitionistically, your point will be far from a random
           | point if x1 is close to zero, and x1 will be close to zero
           | because _everything is close to zero._
           | 
           | But we can be more mathematical about it. Our (very
           | reasonable) assumption is that the volume of a n-dimensional
           | disk is proportional to the nth power of its radius. The
           | third equation shows that x1 is going to be big (meaning the
           | distance to the chosen point above is not so close to the
           | diameter) if a corresponding[1] point on the n-disk is close
           | to the middle. But the distance from the origin, R, of a
           | random point in the n-disk is distributed with pdf
           | proportional to p(r) = r^n for r in [0,1]. So the cdf is just
           | r^(n+1) and E[x1^2] = 1 - E[R] = 1 - (n+1)/(n+2), which tends
           | to 0 as n grows.
           | 
           | Therefore we get E[D] = E[(1-x1)^2] + 1 - E[x1^2] which tends
           | to 2 as n grows large.
           | 
           | [1] the correspondence is that if I give you a point on a
           | disk, you can turn it into a point on a sphere by flipping a
           | coin to decide if it goes in the upper or lower hemisphere
           | and then projecting up or down perpendicular to the disk from
           | the point onto the sphere. But thinking a little more, I'm
           | not sure this preserves the metric as it favours points on
           | the sphere that correspond to the middle parts of the disk.
           | So I think the actual expected value of x1 should be smaller.
        
             | WithinReason wrote:
             | Let me hijack your explanation starting from this point:
             | D = (x1-1)2 + (x22 + ... + x[?]2)
             | 
             | Since all the x[?]2 sum to 1, as the dimensionality grows
             | ([?]x[?]2-1 as n-[?]) each individual x[?] will converge
             | towards 0. Since x1 is almost 0, therefore the (x1-1)2 term
             | will be almost 1.
             | 
             | Since we know that [?]x[?]2=1, and that x12 is almost 0,
             | then we also know that [?]x[?]2 - x12 is almost 1, which is
             | the 2nd half of the above expression for D. So the average
             | distance converges to "almost 1 + almost 1", which "almost
             | 2", which is the diameter.
        
               | akomtu wrote:
               | "each individual x[?] will converge towards 0"
               | 
               | I'm not sure it will. x1 is chosen randomly in the -1..1
               | interval. I dont see how the million other dimensions
               | would force it to stick to 0. Those N other dimensions
               | shrink the stddev(xi) by sqrt(N), though.
        
               | WithinReason wrote:
               | Then try normalizing a random 1000-element vector. The
               | average of the vector elements is around 0.027.
        
               | Retric wrote:
               | Close, the distance formula is square root of (X1^2 +
               | X2^2 ...).
               | 
               | So exactly 1 gives a distance of 1, but almost 1 + almost
               | 1 gives a distance of _almost_ square root of 2.
        
               | WithinReason wrote:
               | Good point!
        
           | adgjlsfhk1 wrote:
           | I think the most intuitive way of thinking about this is
           | sphere packing. Asking what percent of points are within
           | distance d of an n-sphere of radius 1 is equivalent to asking
           | what the ratio of volumes is. For d<1, the n-volume of a
           | radius d sphere tends to 0 as n goes towards infinity, so
           | that means almost all of the points are as far away as
           | possible.
        
         | bjourne wrote:
         | It's just another way to state the
         | https://en.wikipedia.org/wiki/Curse_of_dimensionality
        
       | zwaps wrote:
       | Can someone speak to the generality of assuming c-isoperimetry
       | for the distribution of features?
       | 
       | Without knowing anything about this in particular, this seems to
       | be a rather pertinent restriction of the result related to things
       | like sampling assumptions and the like.
        
         | woopwoop wrote:
         | It really depends on what we assume about "natural" data. If it
         | looks "positively curved", e.g. the uniform measure on the
         | boundary of a convex body, or the a gaussian, or something,
         | this holds. But if the distribution exhibits a strong
         | hierarchical structure that's not so good. I think it's a
         | plausible if not obviously true assumption.
        
       | AmericanBlarney wrote:
       | This conclusion feels like saying more CPU and memory are better.
       | Seems obvious that more moves allows matching to have more
       | nuance, but I guess cool that someone proved it.
        
         | nazgul17 wrote:
         | From what I understand, it says that more parameters are good.
         | This wasn't obvious before this paper: you can fit a polynomial
         | instead of a neutral net, but adding parameters wouldn't help
         | with robustness in that case: the polynomial would become more
         | and more jagged.
        
         | ska wrote:
         | > Seems obvious that more moves allows matching to have more
         | nuance,
         | 
         | This really has to be balance against overfitting. The key
         | problem in ML is generalization, and lots of things improve
         | training performance while making that worse.
        
       | amelius wrote:
       | Asymptotically better? Or practically better?
        
         | ravi-delia wrote:
         | We know from reality that they get practically better, but
         | theoretic intuition suggests you shouldn't see an effect after
         | some point. This paper shows that this intuition is wrong if
         | you want your networks to be robust. It doesn't guarantee large
         | networks will be though.
        
       | kd5bjo wrote:
       | Is there a corresponding result that gives the number of examples
       | needed to provide a sufficient training set for a given physical
       | phenomenon? I'm imagining a high-dimensional equivalent of
       | Nyquist's sampling theorem.
       | 
       | Coupled with this result, we'd then have a reasonable estimator
       | of the network size required for particular tasks before even
       | starting the data collection.
        
         | pishpash wrote:
         | VC dimension?
        
       | rackjack wrote:
       | Silly thought: if bigger NN's are better, shouldn't more neurons
       | be better? Why aren't elephants smarter than us, despite having
       | more neurons?
       | 
       | https://en.wikipedia.org/wiki/List_of_animals_by_number_of_n...
       | 
       | https://pubmed.ncbi.nlm.nih.gov/24971054/
        
         | kemiller wrote:
         | IANANS but my understanding is that neurons/body mass is more
         | indicative. Large animals have more neurons because large
         | bodies need more.
        
           | salty_biscuits wrote:
           | They talk about the encephalization quotient, which is to the
           | 2/3 power
           | 
           | https://en.wikipedia.org/wiki/Encephalization_quotient
        
         | cloogshicer wrote:
         | You probably already know this (since you wrote "silly
         | thought"), but real-life neurons are ridiculously more complex
         | than simulated "neurons" in an NN. So the analogy doesn't
         | really hold.
        
           | pishpash wrote:
           | They're more complex in biological construction and in
           | signaling mechanism, but no proof that they are more complex
           | in function.
        
             | mattkrause wrote:
             | An individual biological neuron can compute a variety of
             | functions, including max and xor, that a single perceptron
             | can't (e.g.,
             | https://www.science.org/doi/10.1126/science.aax6239 ). In
             | general, one needs a fairly elaborate ANN to approximate
             | the behavior of a single biological neuron.
             | 
             | OTOH, a three-layer network is a universal function
             | approximator and RNNs are universal dynamical systems
             | approximators, so they are sort of trivially equivalent.
        
           | XnoiVeX wrote:
           | I think a lot of people on this thread are missing this
           | critical insight.
        
             | visarga wrote:
             | You can simulate the data processing of a real neuron with
             | 1000 digital ones, a small neural net.
             | 
             | I think we read too much into the complexity of biological
             | neurons. Remember they need to do much more than compute
             | signals. They need to self assemble, self replicate and
             | pass through various stages of growth. They need to
             | function for 80-100 years. Many of those neurons and
             | synapses exist only for redundancy and other biological
             | constraints.
             | 
             | A digital neuron doesn't care about its physical substrate
             | and can be millions of times faster. They can be copied
             | identically for no cost and cheaply fine-tuned for new
             | tasks. Their architecture and data can evolve much faster
             | than ours, and the physical implementation can remain the
             | same during this process.
        
         | juancn wrote:
         | Well, biological neurons are much more complex than CS neurons
         | (https://www.quantamagazine.org/how-computationally-
         | complex-i...).
         | 
         | Also, you're working under the assumption that they are
         | equivalent between mammals which as far as we can tell it's not
         | the case (https://www.medicalnewstoday.com/articles/why-are-
         | human-neur...).
         | 
         | So my guess is that the comparison is much more complex than
         | just number of neurons.
        
         | gfody wrote:
         | are we certain they're not? i'm not sure we know how to measure
         | smartness
        
           | beebeepka wrote:
           | I only stopped saying my cat is smarter than the vast
           | majority of people I've met because she is no longer with us.
           | 
           | I did, and still do, believe this to be true. Would love to
           | befriend a bird
        
             | wizzwizz4 wrote:
             | You can befriend corvids. Teaching them symbolic language
             | is tricky, but they can trade and socialise and solve
             | puzzles (if you manage to explain the puzzle).
        
           | peterburkimsher wrote:
           | Dumbo is smarto!
           | 
           | Elephants have bodies built like a tank (and used as such by
           | Hannibal), but humans have better I/O ports.
           | 
           | {reading, writing, listening, speaking, singing, typing,
           | doing, going}
           | 
           | Without opposable thumbs, an elephant is probably quite
           | envious of human writing & typing. Let's use the privilege
           | wisely to encourage one another, teach and learn from each
           | other, from Donald Tusk, and give a helping hand.
        
             | Someone wrote:
             | But African elephants have quite versatile opposable
             | finger-like extensions at the tip of their trunks (Asian
             | elephants have only one such thing)
        
         | alexpotato wrote:
         | Because, IIRC, a lot of neurons are dedicated to
         | motion/sensing.
         | 
         | Bigger animals may require more neurons to handle moving larger
         | and/or more complicated muscle groups.
         | 
         | Interesting related point there is the encephalization quotient
         | which is related to the predicted ratio of brain size to body
         | mass. On the wikipedia page [0] they list the EQ for various
         | animals. Humans are the highest but dolphins and ravens are not
         | far behind.
        
           | molticrystal wrote:
           | To further emphasize that having neural material focused on
           | the appropriate functions is more important vs how much you
           | have, here is a story about a guy whose brain is mostly
           | hollow and filled with fluid, it probably did cause his IQ to
           | be 75 and causes him weakness in his legs, but otherwise he
           | lives a normal adult life more or less.
           | 
           | https://www.newscientist.com/article/dn12301-man-with-
           | tiny-b...
        
             | acchow wrote:
             | Doesn't this demonstrate the opposite of what you were
             | claiming?
        
               | lacksconfidence wrote:
               | I feel like the quotes agree with parent:
               | 
               | > "If something happens very slowly over quite some time,
               | maybe over decades, the different parts of the brain take
               | up functions that would normally be done by the part that
               | is pushed to the side," adds Muenke, who was not involved
               | in the case.
        
               | Ajedi32 wrote:
               | Did you see the scans? The dude's head is practically
               | _empty_ (brain 55-75% smaller than normal) and nobody
               | even noticed until he was 44 years old and got an MRI.
        
               | divbzero wrote:
               | I think it's that _a priori_ you would expect a hollow
               | brain to have a far more drastic effect and not allow for
               | a mostly normal adult life.
        
               | pishpash wrote:
               | Why would you expect that, when a tiny insect can do
               | pretty intelligent things? What "unexpected" things
               | humans can do are probably all in the >75 IQ range.
        
             | gwern wrote:
             | Volume != neurons. In any case, 75 is awful and is usually
             | considered borderline retarded. (If you're tempted to
             | respond with other cases of higher IQ, note that they are
             | often retracted or unconfirmed and likely fraudulent in
             | some way; see https://www.gwern.net/Hydrocephalus .)
        
               | willmw101 wrote:
               | >Volume != neurons
               | 
               | Exactly. Most of the newer research on this topic
               | suggests that it's neural connection complexity, and
               | specifically frontal lobe volume, rather than overall
               | brain size that determines intelligence or brain power.
               | 
               | https://neuroscience.stanford.edu/news/ask-
               | neuroscientist-do...
               | 
               | >Luckily, there is much more to a brain when you look at
               | it under a microscope, and most neuroscientists now
               | believe that the complexity of cellular and molecular
               | organization of neural connections, or synapses, is what
               | truly determines a brain's computational capacity. This
               | view is supported by findings that intelligence is more
               | correlated with frontal lobe volume and volume of gray
               | matter, which is dense in neural cell bodies and
               | synapses, than sheer brain size. Other research comparing
               | proteins at synapses between different species suggests
               | that what makes up synapses at the molecular level has
               | had a huge impact on intelligence throughout evolutionary
               | history. So, although having a big brain is somewhat
               | predictive of having big smarts, intelligence probably
               | depends much more on how efficiently different parts of
               | your brain communicate with each other.
        
               | mattkrause wrote:
               | As a counterpoint, rats without a cortex can
               | do...basically everything normal rats can do--except trim
               | their toenails. The classic reference for this is
               | Whitslaw's 1990 chapter "The decorticate rat".
               | 
               | This thread has links to a copy, plus a bunch of related
               | studies in humans and animals. https://twitter.com/markdh
               | umphries/status/107105276276554137...
        
         | joebob42 wrote:
         | Aside from other points, more neurons might be better "all else
         | equal", but there are differences between our brain and an
         | elephant's beyond just neuron count.
         | 
         | It's like how just getting a bigger faster computer can help
         | with your problem, but its less powerful than a new more
         | efficient algorithm on the same computer.
        
         | World_Peace wrote:
         | Elephants very likely could be more intelligent than us, it
         | just seems that intelligence is a difficult thing to measure
         | quantitatively.
        
           | bee_rider wrote:
           | In particular, a given elephant might be "more intelligent"
           | than a human -- we just happen to have evolved from a
           | particular niche that has rendered us bizarrely good at
           | abstracting knowledge and combining it with the knowledge of
           | other humans.
        
             | notahacker wrote:
             | What is "more intelligent" if not "more capable of
             | abstracting, synthesizing and sharing knowledge"?
        
               | bee_rider wrote:
               | How about the ability to solve novel problems?
               | 
               | We have very good problem solving ability of course, but
               | a superpowered ability to ask others how they solved the
               | problem. If we wanted to somehow define a kind of 'brain
               | horsepower' type intelligence, it seems to me that the
               | former is closer to it than the latter, and it doesn't
               | seem obvious to me that humans would necessarily take the
               | top spot. Or that there's a reasonable/ethical way to
               | test it -- let's take a human, elephant, crow, and
               | dolphin, raise them in total isolation from the any
               | community to get a measure of their untrained
               | intelligence... we might get some interesting results on
               | intelligence, but mostly we will learn something about
               | ballistics as some ethics review board launches us unto
               | the Sun.
        
               | jayd16 wrote:
               | You'd also need the desire for such things.
        
           | tshaddox wrote:
           | It may be hard to measure and even define precisely, but I
           | think it's pretty clear that if we did agree on a definition
           | in the context of this conversation it would be defined in
           | such a way that humans are more intelligent than elephants.
        
           | lariati wrote:
           | I have listened to Francois Chollet say that all intelligence
           | is specialized intelligence.
           | 
           | I suspect the question really doesn't make sense if that is
           | true.
           | 
           | We just have this bias/mind projection fallacy that
           | intelligence is a general physical property of the brain that
           | can be measured. I just suspect this is not true.
           | 
           | Like athletic ability doesn't generalize well. Of course,
           | someone not athletic at all is never going to be a great
           | athlete in anything but it makes no sense to compare Lance
           | Armstrong to Patrick Mahomes in some general athletic
           | context. Putting a number on a general athletic ability index
           | between the two would just be total nonsense.
        
         | tshaddox wrote:
         | For one thing, when the article says "bigger" it means "more
         | parameters," not "more neurons."
        
         | fabiospampinato wrote:
         | I read on wikipedia [0] the other day a fairly disturbing
         | statistic related to this, apparently human men have on average
         | a ~10% bigger brain than women. It'd be interesting to know if
         | that translates to a higher neuron count or the difference in
         | volume is due to something else.
         | 
         | [0]: https://en.wikipedia.org/wiki/Brain_size#:~:text=In%20men%
         | 20....
        
           | andrewflnr wrote:
           | Probably just a consequence of overall physical size being
           | larger. AFAIK there continues to be no evidence of a sex
           | difference in overall intelligence, so slight difference in
           | brain size is probably a red herring.
        
         | gwern wrote:
         | Density is also important. If we look at other things - some
         | recent studies have been done on number-counting (https://royal
         | societypublishing.org/doi/10.1098/rstb.2020.052...) or bird
         | brains (https://www.gwern.net/docs/psychology/neuroscience/2020
         | -herc...) - density jumps out as a major predictor. African
         | elephants may have some more neurons, but the density isn't as
         | great as a human where it counts, so they are remarkably
         | intelligent (like ravens and crows), but still not human-level.
         | There are diminishing returns in both directions. We have more
         | neurons than any bird as much or more dense, and we have more
         | density than any elephant with as many or more neurons. Put
         | that together, and we squeak across the finish line to being
         | just smart enough to create civilization.
         | 
         | An analogy: what's the difference between a supercomputer, and
         | the same number of CPUs scattered across a few datacenters?
         | It's that in a supercomputer, those CPUs are packed physically
         | as close as possible with expensive interconnects to allow them
         | to communicate as fast as possible. (For many applications, the
         | supercomputer will finish long before the spread out nodes ever
         | finish communicating and idling.) But you need to improve both
         | or else your new super-fast CPUs will spend all their time
         | waiting on Infiniband to chug through, or your fancy new
         | Infiniband will be underutilized and you should've bought more
         | CPUs.
        
           | user90349032 wrote:
           | And yet, no animal except humans is self aware. Really makes
           | you wonder why that is.
        
             | Swizec wrote:
             | There are lots of self aware non human animals.
             | 
             | Dolphins and elephants are famous examples, most primates
             | as well. Even many birds show levels of self awareness and
             | theory of mind (they know the difference between what they
             | know and what others know)
        
               | visarga wrote:
               | Seems like being a social animal is necessary for self
               | awareness.
        
               | Ardon wrote:
               | You might be interested in the theories on the evolution
               | of human intelligence: https://en.wikipedia.org/wiki/Evol
               | ution_of_human_intelligenc...
               | 
               | This is exactly the question the field is about, and I
               | find it fascinating to read about
        
               | Swizec wrote:
               | In fact there is a popular theory[1] that bird
               | intelligence evolved because of the way their social
               | structures work. Birds mate for life _but they cheat_.
               | Every bird wants their partner to be loyal and itself to
               | sex as many other birds as possible.
               | 
               | This means birds have to keep track of who can and can't
               | see them cheat, who knows and who doesn't. There's even
               | evidence that they rat each other out (2nd degree info)
               | if they think there's a reward to be had. All of this
               | requires immense intelligence, which happens to prove
               | useful in other contexts.
               | 
               | There's also a bird species who does this with food
               | caches. Easier to steal from others than to build their
               | own so a plethora of deceptive tactics developed to
               | ensure others can't see where you're storing those
               | delicious nuts. Complete with fake caches, lying, and
               | espionage.
               | 
               | [1] I learned about it in The Genius of Birds
        
               | attemptone wrote:
               | There are also lots of non self-aware human animals :P
        
             | q845712 wrote:
             | are you sure?
             | https://en.wikipedia.org/wiki/Theory_of_mind_in_animals
        
             | dr_dshiv wrote:
             | Self awareness is social self awareness. Viewing oneself as
             | a social actor.
        
             | stjohnswarts wrote:
             | that is simply incorrect bonobos, orcas, elephants,
             | dolphins, chimpanzees, etc have all shown degrees of self
             | awareness.
        
             | moomin wrote:
             | Probably that you don't know how to measure what you're
             | describing.
             | 
             | Plenty of animals recognise themselves in the mirror, for
             | instance.
        
         | btilly wrote:
         | How do you measure intelligence? Elephants have much better
         | memories than we do!
         | 
         | https://www.scientificamerican.com/article/elephants-never-f...
        
           | Ajedi32 wrote:
           | That article doesn't seem to support your claim. All of the
           | feats mentioned would be entirely unremarkable in your
           | average human.
        
             | btilly wrote:
             | Really? You'd immediately recognize someone you knew for a
             | few weeks over 20 years ago? You wouldn't need a bit to try
             | to figure out who they are?
             | 
             | If so, then your memory is unusually good. I know that this
             | is well beyond my capabilities. Nor do I have the ability
             | to visit a place that I lived 40 years earlier and find my
             | way around.
        
               | Someone wrote:
               | How many other elephants did these elephants see in those
               | 20+ years? It wouldn't surprise me if that was fewer than
               | 100. How many did they spend a few weeks or more with? It
               | wouldn't surprise me if that were less than 20.
               | 
               | There also, AFAIK, isn't evidence they remember _all_
               | other elephants they've shared time with for at least few
               | weeks (I certainly do not rule that out, either, given
               | the low number they likely will meet in their life)
        
               | tshaddox wrote:
               | > You'd immediately recognize someone you knew for a few
               | weeks over 20 years ago?
               | 
               | Yeah? Maybe not if they were a kid 20 years ago or their
               | appearance had otherwise changed significantly, but
               | otherwise I don't see why not.
        
               | Spooky23 wrote:
               | I think it depends on the intensity of the experience.
               | 
               | I recently found myself in a hotel that I stayed in as a
               | 7-8 year old in the 80s for a particularly memorable
               | vacation with my extended family. It was funny that I
               | still remembered the I unusual aspects of the layout and
               | could spot many of the changes that had been made over
               | the years.
               | 
               | But if you asked me to describe someone I met for a few
               | days in a business context in 2020, I'd have a hard time
               | remembering detail.
        
       | 6gvONxR4sf7o wrote:
       | Off topic, but I love that they make it trivial to find a link to
       | the original paper. I know not everyone loves quanta, but stuff
       | like this is really refreshing.
        
         | lordgrenville wrote:
         | What do people not like about Quanta?
        
       ___________________________________________________________________
       (page generated 2022-02-10 23:00 UTC)