[HN Gopher] Computer scientists prove why bigger neural networks... ___________________________________________________________________ Computer scientists prove why bigger neural networks do better Author : theafh Score : 165 points Date : 2022-02-10 16:02 UTC (6 hours ago) (HTM) web link (www.quantamagazine.org) (TXT) w3m dump (www.quantamagazine.org) | SpaceManNabs wrote: | > Right now, we are routinely creating neural networks that have | a number of parameters more than the number of training samples. | This says that the books have to be rewritten. | | Confused by this statement. Double descent with | overparameterization is exhibited in "classical settings" too and | mentioned in older books. | | > In their new proof, the pair show that overparameterization is | necessary for a network to be robust. | | What is important to note here is that many of papers this paper | cites prove or show this result in certain network architectures. | This paper adds universality. | | > The proof is very elementary -- no heavy math, and it says | something very general | | The most elementary part was clever use of Hoeffding's | inequality. Some people are really fast readers haha. | | I don't even know how you pick up the fact that isoperimetry | holds in manifold settings with positive curvature while also | playing with all those norms and inequalities. A few years ago I | mentioned on here all the maths that I knew or wanted to know to | read more papers, and others critiqued that the list was too | long. Well, this is why! | aray wrote: | > Double descent with overparameterization is exhibited in | "classical settings" too and mentioned in older books. | | I'm curious for references or citations to this. When I was | going over double descent I tried to find citations like this | (just in a couple places like ML/stats textbooks). | moyix wrote: | Here's one that lists some older references: | https://arxiv.org/abs/2004.04328 | tomrod wrote: | There are a handful of papers in the 90s that show this, but | it wasn't recognized for what it is. Double descent is REALLY | crazy to me, coming from a classical background. | pishpash wrote: | Over-parameterization for regularization is really old. The | pseudoinverse min-norm solution for under-determined linear | systems even has that flavor. | tomrod wrote: | Sure, but that's identification approaches in | econometrics and matrix analysis contexts. Using that for | neural networks is new-ish in the zeitgeist, which did | not exist in the 1990s as it does today. | throwmeawaysoon wrote: | feketegy wrote: | This looks interesting, I bookmarked it. | | My biggest blocker is the "statistics" part of M/L, knowing what | algorithms to choose for various cases. | qorrect wrote: | This book was a big help for me and is very well written, | https://xcelab.net/rm/statistical-rethinking/ . You can find it | free online ( along with video course ). The printed version is | a very nice high quality book. | stevofolife wrote: | Do you know if there are any online classes that use this | book as a reference? Or more generally, what type of courses | teaches this subject? | lariati wrote: | Thanks so much. That is an amazing level of choice in the | example code. I need this right now as a type of statistical | strength training. | kache_ wrote: | check out introduction to statistical learning | [deleted] | stared wrote: | I am surprised that the paper does not even cite the Lottery | Ticket Hypothesis (https://arxiv.org/abs/1803.03635, | https://eng.uber.com/deconstructing-lottery-tickets/). | | In the LTH paper (IMHO the most fundamental deep learning | publication in the last few years), the number of tickets goes as | layer_size^n_layers. | gwern wrote: | I don't see how lottery tickets yield the isoperimetry result, | even in a heuristic or handwavy sort of way. Yes, a larger | network is more likely to have good-scoring subnetworks; sure. | But that's all it says. What does that tell me about how | efficiently I can construct an adversarial example? For that, I | need something else, like, say a geometric argument about what | sort of network will interpolate between high-dimensional | datapoints with properties like "not changing much in response | to small input changes"... | renewiltord wrote: | Considering the subject, it is at least somewhat amusing that | you double posted this. | samwisedum wrote: | Let's add more nodes so we can overfit even better! | ogogmad wrote: | Does this help against adversarial examples? The article seems to | suggest so. | prideout wrote: | > The proof relies on a curious fact about high-dimensional | geometry, which is that randomly distributed points placed on the | surface of a sphere are almost all a full diameter away from each | other. | | What theorem is this referring to? Sounds like something I should | already be familiar with, but I'm not. | grungegun wrote: | For reference, see the book High Dimensional Probability by | Vershynin. It's free online. See Theorem 3.1.1. It proves that | a sub-gaussian random vector is in some sense close in norm to | sqrt(n) where n is the number of dimensions. Most of these | results are true up to multiplying by some unknown constant. | [deleted] | [deleted] | aix1 wrote: | Not my area of expertise, but the quoted "fact" seems at best | incompletely stated: surely for it to hold there must be some | constraints on the number of points (likely as a function of | the diameter)? | pfortuny wrote: | It is for VERY HUGE n, as siblings explain. | Retric wrote: | It's just wrong as stated, there is only one point a full | diameter away from each point on a high dimensional sphere. | Aka (1,0,0,0,0, ...) maps to (-1,0,0,0,0, ...) and nothing | else. Just as (1,0) maps to (-1,0) on a unit circle and | (1,0,0) maps to (-1,0,0) on a unit sphere. | | On a high dimensional sphere they should generally be close | to square root of 2 radius away from each other. | hedora wrote: | If the data points are in the space [0,1]^n, and your | metric function is: | | d(x,y) = 0 if x == y; 1 otherwise | | Then all points are distance one apart. It's been proven | that, as dimensionality increases, normal euclidian | distance over uniform point clouds rapidly converges to | have the same behavior as the equality metric. | | The proof relies on the information gained by performing | pairwise distance calculations. | | In the example distance function I gave, there is zero | information gained if you plug in two points that are known | to be non-equal. | | The information gained from evaluating the Euclidian | distance function converges to zero as the dimensionality | of the data set increases. | | (Note: This does not hold for low dimensional data that's | been embedded in a higher dimensional space.) | | Edit: Misread your comment. Yes, everything ends up being | the same distance apart. More precisely, the ratio of mean | distance / stddev distance tends to infinity. The intrinsic | dimensionality of the data is monotonic w.r.t. that ratio. | bick_nyers wrote: | Euclidean distance calculations change based on number of | dimensions, for example, in 3 dimensions it is | sqrt(a^2+b^2+c^2). | Retric wrote: | Yes, that's why it's square root of 2. Consider the | origin (0,0,0, ...) to a random point on the sphere (~0, | ~0, ~0, ...). | | Distance = square root of ((X1 - X2) ^ 2 + (Y1 - Y2) ^2 + | ...). So D = square root of ((~0-0)^2 + (~0-0)^2 + | (~0-0)^2 + ... ), which is equal to 1 by definition of | the unit high dimensional sphere. | | So distance from (1,0,0,0 ...) to (~0, ~0, ~0, ...) = | square root of ((~0-1)^2 + (~0-0)^2 + (~0-0)^2 + ... ) ~= | square root of 2. | bick_nyers wrote: | Ahh ok, for some reason I was thinking (1,1,1) would be a | valid point in this case | dan-robertson wrote: | The fact should say that the expected distance between two | random points tends to the diameter as the dimension | increases. The intuition is that to be close you need to be | close in a large number of coordinates and the law of large | numbers (though coordinates aren't independent) suggests | that is unlikely. If you fix one point on a sphere (say | (1,0,...,0)) then, for a high dimension, most points will | not have any extreme values in coordinates and will look | like (~0,~0,...,~0) where ~0 means something close to zero. | But if we sum the squares of everything apart from the | first we get 1 - (~0)^2 ~= 1, so the distance from our | fixed point is (1 - ~0)^2 + sum_2^n (0 - ~0)^2 ~= 1 + 1 = | 2. | Retric wrote: | You forgot the square root on distance formula. Distance | = square root of ((X1 - X2) ^ 2 + (Y1 - Y2) ^2 + ...). | | Consider the origin (0,0,0, ...) to a random point on the | sphere (~0, ~0, ~0, ...). So Distance from origin = | square root of ((~0-0)^2 + (~0-0)^2 + (~0-0)^2 + ... ), | which sums to 1 by definition of the unit high | dimensional sphere. | | Then plug in 1 vs 0 in the first place because we care | about (1,0,0,0 ...) and you get the correct answer = | square root of ((~0-1)^2 + (~0-0)^2 + (~0-0)^2 + ... ) ~= | square root of 2. | | Edited to fix typo and add clarity. | dan-robertson wrote: | Wow. Can't believe I missed that. | ravi-delia wrote: | It should be almost all points are _almost_ a full diameter | away. However it 's still very striking, and an unintuitive | fact about very high dimensional spheres. | leto_ii wrote: | I think it's something related to the curse of dimensionality | [1] [2], basically just a property of high dimensional spaces | (perhaps only certain kinds of spaces though). | | [1] https://en.wikipedia.org/wiki/Curse_of_dimensionality | | [2] http://kops.uni- | konstanz.de/bitstream/handle/123456789/5715/... | hedora wrote: | The intrinsic dimensionality of a dataset is also relevant | here. | | The M-Tree is one of my favorite indexes. It works with data | that's embedded in infinite dimensional spaces (sometimes; | it's bumping up against an impossibility result that's | sketched in a sibling comment). | bo1024 wrote: | Yes. | | Even though almost every all pairs of points are almost a | full diameter away from each other, they are also almost all | almost orthogonal (i.e. the angle they make with the center | of the sphere is very close to 90 degrees). | bick_nyers wrote: | My initial intuition is telling me that it would be diameter/2, | from the perspective of a single point, the closest points | would be near zero distance away, and the furthest points would | be on the opposite side, a full diameter away, and I am | assuming that there are a lot of points in a uniform | distribution. | | What I have just thought about though, is what points would be | exactly diameter/2 distance away from that point? If you have a | circle, you might think it would be the points that form a 90 | degree triangle, but that is not the case, those points would | be sqrt(2)*radius distance away. | | So while it is obvious to me that it is not diameter/2, it is | not obvious to me why it would be diameter either, or how | larger n converges it closer to the diameter or some other | fixed number. | dan-robertson wrote: | If you consider a point on the sphere it means choosing a | bunch of xi such that: x1^2 + x2^2 + ... + | xn^2 = 1. | | Suppose wlog you pick (1,0,0,...,0). Then the distance from | your point to a random point is: D = (x1-1)^2 | + x2^2 + ... + xn^2 | | And from the first equation we know: x1^2 = 1 | - x2^2 - x3^2 - ... - xn^2 | | Intuitionistically, your point will be far from a random | point if x1 is close to zero, and x1 will be close to zero | because _everything is close to zero._ | | But we can be more mathematical about it. Our (very | reasonable) assumption is that the volume of a n-dimensional | disk is proportional to the nth power of its radius. The | third equation shows that x1 is going to be big (meaning the | distance to the chosen point above is not so close to the | diameter) if a corresponding[1] point on the n-disk is close | to the middle. But the distance from the origin, R, of a | random point in the n-disk is distributed with pdf | proportional to p(r) = r^n for r in [0,1]. So the cdf is just | r^(n+1) and E[x1^2] = 1 - E[R] = 1 - (n+1)/(n+2), which tends | to 0 as n grows. | | Therefore we get E[D] = E[(1-x1)^2] + 1 - E[x1^2] which tends | to 2 as n grows large. | | [1] the correspondence is that if I give you a point on a | disk, you can turn it into a point on a sphere by flipping a | coin to decide if it goes in the upper or lower hemisphere | and then projecting up or down perpendicular to the disk from | the point onto the sphere. But thinking a little more, I'm | not sure this preserves the metric as it favours points on | the sphere that correspond to the middle parts of the disk. | So I think the actual expected value of x1 should be smaller. | WithinReason wrote: | Let me hijack your explanation starting from this point: | D = (x1-1)2 + (x22 + ... + x[?]2) | | Since all the x[?]2 sum to 1, as the dimensionality grows | ([?]x[?]2-1 as n-[?]) each individual x[?] will converge | towards 0. Since x1 is almost 0, therefore the (x1-1)2 term | will be almost 1. | | Since we know that [?]x[?]2=1, and that x12 is almost 0, | then we also know that [?]x[?]2 - x12 is almost 1, which is | the 2nd half of the above expression for D. So the average | distance converges to "almost 1 + almost 1", which "almost | 2", which is the diameter. | akomtu wrote: | "each individual x[?] will converge towards 0" | | I'm not sure it will. x1 is chosen randomly in the -1..1 | interval. I dont see how the million other dimensions | would force it to stick to 0. Those N other dimensions | shrink the stddev(xi) by sqrt(N), though. | WithinReason wrote: | Then try normalizing a random 1000-element vector. The | average of the vector elements is around 0.027. | Retric wrote: | Close, the distance formula is square root of (X1^2 + | X2^2 ...). | | So exactly 1 gives a distance of 1, but almost 1 + almost | 1 gives a distance of _almost_ square root of 2. | WithinReason wrote: | Good point! | adgjlsfhk1 wrote: | I think the most intuitive way of thinking about this is | sphere packing. Asking what percent of points are within | distance d of an n-sphere of radius 1 is equivalent to asking | what the ratio of volumes is. For d<1, the n-volume of a | radius d sphere tends to 0 as n goes towards infinity, so | that means almost all of the points are as far away as | possible. | bjourne wrote: | It's just another way to state the | https://en.wikipedia.org/wiki/Curse_of_dimensionality | zwaps wrote: | Can someone speak to the generality of assuming c-isoperimetry | for the distribution of features? | | Without knowing anything about this in particular, this seems to | be a rather pertinent restriction of the result related to things | like sampling assumptions and the like. | woopwoop wrote: | It really depends on what we assume about "natural" data. If it | looks "positively curved", e.g. the uniform measure on the | boundary of a convex body, or the a gaussian, or something, | this holds. But if the distribution exhibits a strong | hierarchical structure that's not so good. I think it's a | plausible if not obviously true assumption. | AmericanBlarney wrote: | This conclusion feels like saying more CPU and memory are better. | Seems obvious that more moves allows matching to have more | nuance, but I guess cool that someone proved it. | nazgul17 wrote: | From what I understand, it says that more parameters are good. | This wasn't obvious before this paper: you can fit a polynomial | instead of a neutral net, but adding parameters wouldn't help | with robustness in that case: the polynomial would become more | and more jagged. | ska wrote: | > Seems obvious that more moves allows matching to have more | nuance, | | This really has to be balance against overfitting. The key | problem in ML is generalization, and lots of things improve | training performance while making that worse. | amelius wrote: | Asymptotically better? Or practically better? | ravi-delia wrote: | We know from reality that they get practically better, but | theoretic intuition suggests you shouldn't see an effect after | some point. This paper shows that this intuition is wrong if | you want your networks to be robust. It doesn't guarantee large | networks will be though. | kd5bjo wrote: | Is there a corresponding result that gives the number of examples | needed to provide a sufficient training set for a given physical | phenomenon? I'm imagining a high-dimensional equivalent of | Nyquist's sampling theorem. | | Coupled with this result, we'd then have a reasonable estimator | of the network size required for particular tasks before even | starting the data collection. | pishpash wrote: | VC dimension? | rackjack wrote: | Silly thought: if bigger NN's are better, shouldn't more neurons | be better? Why aren't elephants smarter than us, despite having | more neurons? | | https://en.wikipedia.org/wiki/List_of_animals_by_number_of_n... | | https://pubmed.ncbi.nlm.nih.gov/24971054/ | kemiller wrote: | IANANS but my understanding is that neurons/body mass is more | indicative. Large animals have more neurons because large | bodies need more. | salty_biscuits wrote: | They talk about the encephalization quotient, which is to the | 2/3 power | | https://en.wikipedia.org/wiki/Encephalization_quotient | cloogshicer wrote: | You probably already know this (since you wrote "silly | thought"), but real-life neurons are ridiculously more complex | than simulated "neurons" in an NN. So the analogy doesn't | really hold. | pishpash wrote: | They're more complex in biological construction and in | signaling mechanism, but no proof that they are more complex | in function. | mattkrause wrote: | An individual biological neuron can compute a variety of | functions, including max and xor, that a single perceptron | can't (e.g., | https://www.science.org/doi/10.1126/science.aax6239 ). In | general, one needs a fairly elaborate ANN to approximate | the behavior of a single biological neuron. | | OTOH, a three-layer network is a universal function | approximator and RNNs are universal dynamical systems | approximators, so they are sort of trivially equivalent. | XnoiVeX wrote: | I think a lot of people on this thread are missing this | critical insight. | visarga wrote: | You can simulate the data processing of a real neuron with | 1000 digital ones, a small neural net. | | I think we read too much into the complexity of biological | neurons. Remember they need to do much more than compute | signals. They need to self assemble, self replicate and | pass through various stages of growth. They need to | function for 80-100 years. Many of those neurons and | synapses exist only for redundancy and other biological | constraints. | | A digital neuron doesn't care about its physical substrate | and can be millions of times faster. They can be copied | identically for no cost and cheaply fine-tuned for new | tasks. Their architecture and data can evolve much faster | than ours, and the physical implementation can remain the | same during this process. | juancn wrote: | Well, biological neurons are much more complex than CS neurons | (https://www.quantamagazine.org/how-computationally- | complex-i...). | | Also, you're working under the assumption that they are | equivalent between mammals which as far as we can tell it's not | the case (https://www.medicalnewstoday.com/articles/why-are- | human-neur...). | | So my guess is that the comparison is much more complex than | just number of neurons. | gfody wrote: | are we certain they're not? i'm not sure we know how to measure | smartness | beebeepka wrote: | I only stopped saying my cat is smarter than the vast | majority of people I've met because she is no longer with us. | | I did, and still do, believe this to be true. Would love to | befriend a bird | wizzwizz4 wrote: | You can befriend corvids. Teaching them symbolic language | is tricky, but they can trade and socialise and solve | puzzles (if you manage to explain the puzzle). | peterburkimsher wrote: | Dumbo is smarto! | | Elephants have bodies built like a tank (and used as such by | Hannibal), but humans have better I/O ports. | | {reading, writing, listening, speaking, singing, typing, | doing, going} | | Without opposable thumbs, an elephant is probably quite | envious of human writing & typing. Let's use the privilege | wisely to encourage one another, teach and learn from each | other, from Donald Tusk, and give a helping hand. | Someone wrote: | But African elephants have quite versatile opposable | finger-like extensions at the tip of their trunks (Asian | elephants have only one such thing) | alexpotato wrote: | Because, IIRC, a lot of neurons are dedicated to | motion/sensing. | | Bigger animals may require more neurons to handle moving larger | and/or more complicated muscle groups. | | Interesting related point there is the encephalization quotient | which is related to the predicted ratio of brain size to body | mass. On the wikipedia page [0] they list the EQ for various | animals. Humans are the highest but dolphins and ravens are not | far behind. | molticrystal wrote: | To further emphasize that having neural material focused on | the appropriate functions is more important vs how much you | have, here is a story about a guy whose brain is mostly | hollow and filled with fluid, it probably did cause his IQ to | be 75 and causes him weakness in his legs, but otherwise he | lives a normal adult life more or less. | | https://www.newscientist.com/article/dn12301-man-with- | tiny-b... | acchow wrote: | Doesn't this demonstrate the opposite of what you were | claiming? | lacksconfidence wrote: | I feel like the quotes agree with parent: | | > "If something happens very slowly over quite some time, | maybe over decades, the different parts of the brain take | up functions that would normally be done by the part that | is pushed to the side," adds Muenke, who was not involved | in the case. | Ajedi32 wrote: | Did you see the scans? The dude's head is practically | _empty_ (brain 55-75% smaller than normal) and nobody | even noticed until he was 44 years old and got an MRI. | divbzero wrote: | I think it's that _a priori_ you would expect a hollow | brain to have a far more drastic effect and not allow for | a mostly normal adult life. | pishpash wrote: | Why would you expect that, when a tiny insect can do | pretty intelligent things? What "unexpected" things | humans can do are probably all in the >75 IQ range. | gwern wrote: | Volume != neurons. In any case, 75 is awful and is usually | considered borderline retarded. (If you're tempted to | respond with other cases of higher IQ, note that they are | often retracted or unconfirmed and likely fraudulent in | some way; see https://www.gwern.net/Hydrocephalus .) | willmw101 wrote: | >Volume != neurons | | Exactly. Most of the newer research on this topic | suggests that it's neural connection complexity, and | specifically frontal lobe volume, rather than overall | brain size that determines intelligence or brain power. | | https://neuroscience.stanford.edu/news/ask- | neuroscientist-do... | | >Luckily, there is much more to a brain when you look at | it under a microscope, and most neuroscientists now | believe that the complexity of cellular and molecular | organization of neural connections, or synapses, is what | truly determines a brain's computational capacity. This | view is supported by findings that intelligence is more | correlated with frontal lobe volume and volume of gray | matter, which is dense in neural cell bodies and | synapses, than sheer brain size. Other research comparing | proteins at synapses between different species suggests | that what makes up synapses at the molecular level has | had a huge impact on intelligence throughout evolutionary | history. So, although having a big brain is somewhat | predictive of having big smarts, intelligence probably | depends much more on how efficiently different parts of | your brain communicate with each other. | mattkrause wrote: | As a counterpoint, rats without a cortex can | do...basically everything normal rats can do--except trim | their toenails. The classic reference for this is | Whitslaw's 1990 chapter "The decorticate rat". | | This thread has links to a copy, plus a bunch of related | studies in humans and animals. https://twitter.com/markdh | umphries/status/107105276276554137... | joebob42 wrote: | Aside from other points, more neurons might be better "all else | equal", but there are differences between our brain and an | elephant's beyond just neuron count. | | It's like how just getting a bigger faster computer can help | with your problem, but its less powerful than a new more | efficient algorithm on the same computer. | World_Peace wrote: | Elephants very likely could be more intelligent than us, it | just seems that intelligence is a difficult thing to measure | quantitatively. | bee_rider wrote: | In particular, a given elephant might be "more intelligent" | than a human -- we just happen to have evolved from a | particular niche that has rendered us bizarrely good at | abstracting knowledge and combining it with the knowledge of | other humans. | notahacker wrote: | What is "more intelligent" if not "more capable of | abstracting, synthesizing and sharing knowledge"? | bee_rider wrote: | How about the ability to solve novel problems? | | We have very good problem solving ability of course, but | a superpowered ability to ask others how they solved the | problem. If we wanted to somehow define a kind of 'brain | horsepower' type intelligence, it seems to me that the | former is closer to it than the latter, and it doesn't | seem obvious to me that humans would necessarily take the | top spot. Or that there's a reasonable/ethical way to | test it -- let's take a human, elephant, crow, and | dolphin, raise them in total isolation from the any | community to get a measure of their untrained | intelligence... we might get some interesting results on | intelligence, but mostly we will learn something about | ballistics as some ethics review board launches us unto | the Sun. | jayd16 wrote: | You'd also need the desire for such things. | tshaddox wrote: | It may be hard to measure and even define precisely, but I | think it's pretty clear that if we did agree on a definition | in the context of this conversation it would be defined in | such a way that humans are more intelligent than elephants. | lariati wrote: | I have listened to Francois Chollet say that all intelligence | is specialized intelligence. | | I suspect the question really doesn't make sense if that is | true. | | We just have this bias/mind projection fallacy that | intelligence is a general physical property of the brain that | can be measured. I just suspect this is not true. | | Like athletic ability doesn't generalize well. Of course, | someone not athletic at all is never going to be a great | athlete in anything but it makes no sense to compare Lance | Armstrong to Patrick Mahomes in some general athletic | context. Putting a number on a general athletic ability index | between the two would just be total nonsense. | tshaddox wrote: | For one thing, when the article says "bigger" it means "more | parameters," not "more neurons." | fabiospampinato wrote: | I read on wikipedia [0] the other day a fairly disturbing | statistic related to this, apparently human men have on average | a ~10% bigger brain than women. It'd be interesting to know if | that translates to a higher neuron count or the difference in | volume is due to something else. | | [0]: https://en.wikipedia.org/wiki/Brain_size#:~:text=In%20men% | 20.... | andrewflnr wrote: | Probably just a consequence of overall physical size being | larger. AFAIK there continues to be no evidence of a sex | difference in overall intelligence, so slight difference in | brain size is probably a red herring. | gwern wrote: | Density is also important. If we look at other things - some | recent studies have been done on number-counting (https://royal | societypublishing.org/doi/10.1098/rstb.2020.052...) or bird | brains (https://www.gwern.net/docs/psychology/neuroscience/2020 | -herc...) - density jumps out as a major predictor. African | elephants may have some more neurons, but the density isn't as | great as a human where it counts, so they are remarkably | intelligent (like ravens and crows), but still not human-level. | There are diminishing returns in both directions. We have more | neurons than any bird as much or more dense, and we have more | density than any elephant with as many or more neurons. Put | that together, and we squeak across the finish line to being | just smart enough to create civilization. | | An analogy: what's the difference between a supercomputer, and | the same number of CPUs scattered across a few datacenters? | It's that in a supercomputer, those CPUs are packed physically | as close as possible with expensive interconnects to allow them | to communicate as fast as possible. (For many applications, the | supercomputer will finish long before the spread out nodes ever | finish communicating and idling.) But you need to improve both | or else your new super-fast CPUs will spend all their time | waiting on Infiniband to chug through, or your fancy new | Infiniband will be underutilized and you should've bought more | CPUs. | user90349032 wrote: | And yet, no animal except humans is self aware. Really makes | you wonder why that is. | Swizec wrote: | There are lots of self aware non human animals. | | Dolphins and elephants are famous examples, most primates | as well. Even many birds show levels of self awareness and | theory of mind (they know the difference between what they | know and what others know) | visarga wrote: | Seems like being a social animal is necessary for self | awareness. | Ardon wrote: | You might be interested in the theories on the evolution | of human intelligence: https://en.wikipedia.org/wiki/Evol | ution_of_human_intelligenc... | | This is exactly the question the field is about, and I | find it fascinating to read about | Swizec wrote: | In fact there is a popular theory[1] that bird | intelligence evolved because of the way their social | structures work. Birds mate for life _but they cheat_. | Every bird wants their partner to be loyal and itself to | sex as many other birds as possible. | | This means birds have to keep track of who can and can't | see them cheat, who knows and who doesn't. There's even | evidence that they rat each other out (2nd degree info) | if they think there's a reward to be had. All of this | requires immense intelligence, which happens to prove | useful in other contexts. | | There's also a bird species who does this with food | caches. Easier to steal from others than to build their | own so a plethora of deceptive tactics developed to | ensure others can't see where you're storing those | delicious nuts. Complete with fake caches, lying, and | espionage. | | [1] I learned about it in The Genius of Birds | attemptone wrote: | There are also lots of non self-aware human animals :P | q845712 wrote: | are you sure? | https://en.wikipedia.org/wiki/Theory_of_mind_in_animals | dr_dshiv wrote: | Self awareness is social self awareness. Viewing oneself as | a social actor. | stjohnswarts wrote: | that is simply incorrect bonobos, orcas, elephants, | dolphins, chimpanzees, etc have all shown degrees of self | awareness. | moomin wrote: | Probably that you don't know how to measure what you're | describing. | | Plenty of animals recognise themselves in the mirror, for | instance. | btilly wrote: | How do you measure intelligence? Elephants have much better | memories than we do! | | https://www.scientificamerican.com/article/elephants-never-f... | Ajedi32 wrote: | That article doesn't seem to support your claim. All of the | feats mentioned would be entirely unremarkable in your | average human. | btilly wrote: | Really? You'd immediately recognize someone you knew for a | few weeks over 20 years ago? You wouldn't need a bit to try | to figure out who they are? | | If so, then your memory is unusually good. I know that this | is well beyond my capabilities. Nor do I have the ability | to visit a place that I lived 40 years earlier and find my | way around. | Someone wrote: | How many other elephants did these elephants see in those | 20+ years? It wouldn't surprise me if that was fewer than | 100. How many did they spend a few weeks or more with? It | wouldn't surprise me if that were less than 20. | | There also, AFAIK, isn't evidence they remember _all_ | other elephants they've shared time with for at least few | weeks (I certainly do not rule that out, either, given | the low number they likely will meet in their life) | tshaddox wrote: | > You'd immediately recognize someone you knew for a few | weeks over 20 years ago? | | Yeah? Maybe not if they were a kid 20 years ago or their | appearance had otherwise changed significantly, but | otherwise I don't see why not. | Spooky23 wrote: | I think it depends on the intensity of the experience. | | I recently found myself in a hotel that I stayed in as a | 7-8 year old in the 80s for a particularly memorable | vacation with my extended family. It was funny that I | still remembered the I unusual aspects of the layout and | could spot many of the changes that had been made over | the years. | | But if you asked me to describe someone I met for a few | days in a business context in 2020, I'd have a hard time | remembering detail. | 6gvONxR4sf7o wrote: | Off topic, but I love that they make it trivial to find a link to | the original paper. I know not everyone loves quanta, but stuff | like this is really refreshing. | lordgrenville wrote: | What do people not like about Quanta? ___________________________________________________________________ (page generated 2022-02-10 23:00 UTC)