[HN Gopher] A visual proof that neural nets can compute any func... ___________________________________________________________________ A visual proof that neural nets can compute any function Author : graderjs Score : 216 points Date : 2022-03-06 11:56 UTC (11 hours ago) (HTM) web link (neuralnetworksanddeeplearning.com) (TXT) w3m dump (neuralnetworksanddeeplearning.com) | kmod wrote: | I think these sorts of arguments are not great because they | confuse "limiting behavior" with "behavior at the limit". Yes if | you are able to construct an infinite-sized MLP it can exactly | replicate a given function, and you can construct a sequence of | MLPs that in some sense converge to this infinite behavior. But | in other measures the approximation might be infinitely bad and | never get better unless the net is truly infinite. | | For an example, consider approximating the identity function | [f(x) = x] with a sigmoid-activation MLP. For any finite size of | the net, the output will have a minimum and maximum value. One | can change the parameters of the net to increase the range of the | output, but at no point is the output range infinite. So even | though you can construct a sequence of MLPs that in the limit in | some sense converges to the identity function, in some sense it | never does. | | The same kind of thinking that leads to the conclusion "neural | nets are universal approximators" would support the existence of | perpetual motion machines; check out the "ellipsoid paradox" for | more info. | j16sdiz wrote: | The author is taking non-linear neurons. This is, IMO, cheating. | JustFinishedBSG wrote: | It's not cheating considering it's not even possible otherwise. | [deleted] | j16sdiz wrote: | This is obviously true for non-linear neurons. Who need a | proof for that? | civilized wrote: | Why do you think it's obvious? Can you spell it out? | | I hope it's more than "presumably, non-linear neurons can | approximate any non-linear function since they both have | non-linear in the name". | programmer_dude wrote: | All functions can be approximated by the Fourier | transform. Look up the equation to see why it applies | here (Hint: it is an integral of f(t)e^(t)). | civilized wrote: | This is about as far from a proof as we are from the | Andromeda Galaxy. | Dylan16807 wrote: | The really simple proof I'd use is: | | 1. A function can be approximately implemented as a | lookup table. | | 2. It's trivial to make a neural network act like a | lookup table. | | Which seems to resemble the article but it's much | simpler. | | Point 2 assumes the neurons are normal non-linear ones. | I'm not saying that's cheating, but I do agree with it | being pretty obvious, at least from the right angle. | civilized wrote: | If you fleshed out the "trivial" point 2 as a proof, I | think the result would be essentially the same as the | article. | | The only way you can make it substantially simpler is if | you use a neuron whose nonlinearity makes it essentially | a restatement of another result. For example, if the | neurons are basically just Haar wavelets. | Dylan16807 wrote: | "Divide x into a bunch of buckets. | | Make two neurons tied to Input that activate very sharply | at the bottom and top edge of each bucket. | | Use those to make a neuron that activates when Input is | in the bucket. | | Weight it so it adds f(x) to Output." | | That's over 100 times shorter than the article. The | method isn't as elegant since it needs two internal | layers but I think it's pretty clear. | | Is it wrong to say that the logical leap from "a neuron | can go from 0 to 1 at a specific input value" to "neurons | can make a lookup table" is trivial? Oh well. | | (With "go from 0 to 1 at a specific input value" being | the nonlinear part.) | civilized wrote: | I encourage you to search for "lookup table" in the | article. | Dylan16807 wrote: | Why? Yes, it says that. | | It's also 6000 words long. | | I'm saying it's not that hard. | | I'm not saying the article is wrong or anything, I'm | saying you can get to the same result MUCH faster. | | "You can turn a neural net into a lookup table" should be | easily understood by anyone that knows both of those | concepts. | | Edit: Like, isn't triggering specific outputs on specific | input conditions the first thing that's usually shown | about neural nets? If not a full lookup table, that's at | least 90% of one and you just need to combine the | outputs. | anothernewdude wrote: | My proof that you can compute any function with a single | neuron: | | 1. Use the function as the activation function. | joppy wrote: | If you take linear neurons, then the whole network is just some | linear (or affine function) of its inputs, and hence the | universal approximation fails (not every continuous function | can be uniformly approximated by linear functions). | advisedwang wrote: | Why? Non-linear activation functions are common and easy? | nautilius wrote: | I don't understand all of the criticism here. This is Ch. 4 in a | basic intro to neural networks. The author provides a well | written, intuitive, and concise demonstration of how sigmoids can | be pieced together to approximate functions, in the layman's | sense of the word. It helped me build intuition when I came | across this maybe 5 (?) years ago. There are other good | 'Chapters' about vanishing gradient or backprop. | | The criticism here is mostly about things that aren't even the | topic of the article: 'it cannot compute non-computable | functions', 'it cannot predict tomorrow's stock price', 'splines | are more efficient', 'it cannot predict how a stick looks bent in | water'. | | It's like saying 'I read "Zen and the art of motorcycle | maintenance, twice, and still don't know how to adjust spark | plugs. Stupid book"'. | GTP wrote: | The criticism is that is over-representing neural networks' | capabilities: saying that they can compute any function is a | very big and incorrect claim, IMO it should at least mention | that non-computable functions are a thing and of course a | neural network can't have better capabilities than the machine | that is running it. This is especially because there are people | out there thinking that machine learning is able to solve any | possible problem, if one of those people comes across this | proof they'll think: "Neural netwoks really can solve any | conceivable problem! Look, there's even a mathematical proof of | this!" | renewiltord wrote: | It's funny, but this is how I felt about my son's | kindergarten. They taught him about "numbers" but instead | they only used Natural numbers. The least they could have | done is mention at least the one-point compactification of | that space. | | Fortunately, I was able to intervene before he was | permanently damaged by the class. He's still struggling with | showing that it is homeomorphic to the subspace of R formed | by {1/n for all n in R that are in N} union {0} and I blame | today's pedagogy for that. | GTP wrote: | The context is different, the problem here is the hype that | goes in a very specific direction (i.e. you can do anything | with ML) as I briefly explained in another comment | nautilius wrote: | I feel "Numbers" are pretty hyped-up! | nautilius wrote: | As I wrote, 'function in the laymen sense of the word'. And | no one actually reading this can, without malice, | misunderstand this: the author uses a nice and smooth y=f(x) | as an example, and shows that it can be approximated with | sigmoids. Nothing more, nothing less. And he does a good job | showing this. | GTP wrote: | You say no one can misunderstand what the author wrote | without malice, but there is a saying that reads "you | should not attribute to malice what can be attributed to | ignorance". Especially among people outside of CS, there is | currently a lot of hype regarding ML and AI in general, if | one of those people looks around to check if a neural | network can really do anything and founds this, what would | they understand? | nautilius wrote: | You're telling me you were confused by what the author | could possible mean with his nice and smooth y=f(x) and | that you misunderstood him that really he was implying | that GAI is just around the corner? The whole discussion | here is one long list of "Gotcha!", missing the point and | the context of the article completely. | | You think these non-CS people you evoke will be too dense | to understand that the author is just approximating a | simple function, and then will leverage their non-CS | background and immediately conclude that ANN can 'solve' | Cantor-dust, non-computability and the halting problem? | That's a very specific set of background your non-CS | people will have to have to fall for that. | | [Edit:] | | What do you think they'll make of "Multilayer feedforward | networks are universal approximators"? | | ("This paper rigorously establishes that standard | multilayer feedforward networks with as few as one hidden | layer using arbitrary squashing functions are capable of | approximating any Borel measurable function from one | finite dimensional space to another to any desired degree | of accuracy, provided sufficiently many hidden units are | available. In this sense, multilayer feedforward networks | are a class of universal approximators.") | charcircuit wrote: | What about a function like f(x) = sin(x)? I feel like when x gets | big the error will start to increase. | CJefferson wrote: | You are right. The argument is that it can approximate sin(x) | over a compact interval, like [0,1]. | | You answer to that might be "I could approximate over [0,1] | with just a lookup table, where I split the input range into n | equal sized pieces for increasing values of n", and you'd be | right -- the "proof" is basically just doing that. | | It's one of those things which is nice to show as a basic | theory thing (there are approximation methods which can never | simulate certain functions), but it's not really of any real | value. | qwerty1793 wrote: | This argument only applies to functions on a compact domain. So | we should only consider trying to approximate sin(x) when x is | in [0, 1], for example. | [deleted] | kevinventullo wrote: | I see a lot of commenters up in arms about Universal | Approximation for NN's, and I think the issue is that it's often | framed as a _superpower_ rather than _table stakes_ for any kind | of general purpose algorithm. | | I posit that any modeling technique which does _not_ have the | universal approximation property will be guaranteed to fail on | large classes of problems no matter how much elbow grease (say | feature engineering) one puts into it. That is, UA is a | _necessary_ but not _sufficient_ condition for a modeling | technique to be fully general (i.e. could form the basis of | whatever AGI is). | DiggyJohnson wrote: | Really well said. Is there a term or concept in the AI | literature++ that captures this point/conjecture? | Kalanos wrote: | it can't learn exponents. it can only multiply. | | there's also a difference between memorizing part of a line, and | being able to extrapolate it. | iamcurious wrote: | I find this very interesting, can you expand or provide links? | marsven_422 wrote: | Jenz wrote: | "Any function" is a big wide claim. Can someone fill me in on | what's required of these functions? Can a neural nets for example | compute non-continuous functions like f(x) = [x is rational]? | advisedwang wrote: | The article says: | | > The second caveat is that the class of functions which can be | approximated in the way described are the continuous functions. | If a function is discontinuous, i.e., makes sudden, sharp | jumps, then it won't in general be possible to approximate | using a neural net. This is not surprising, since our neural | networks compute continuous functions of their input. However, | even if the function we'd really like to compute is | discontinuous, it's often the case that a continuous | approximation is good enough. If that's so, then we can use a | neural network. In practice, this is not usually an important | limitation. | FpUser wrote: | >"No matter what the function, there is guaranteed to be a neural | network so that for every possible input, x, the value f(x) (or | some close approximation) is output from the network" | | With "or some close approximation" being a key I fail to | understand why is it not obvious. | The_rationalist wrote: | credit_guy wrote: | This often cited fact is a red herring. Lots of things can | compute (or rather approximate) any functions. Piece-wise | constant functions obviously can approximate anything, but | nobody's giddy about using piece-wise constant functions for any | numerical purpose (if they do, and they often do, they don't | point with pride of their new application of the piece-wise | constant functions "universal approximation theorem"). | Polynomials, trigonometric polynomials, splines (i.e piecewise | polynomials), radial basis functions, and on and on. | | Just put neural networks to a test against splines, and see how | they fare. Take your favorite function, let's say sin(x) and try | to approximate it with a neural net with 1000 nodes, or with | splines with 20 nodes. You don't stand a chance to match the | quality of the spline approximation. | | Edit: here's a short python snippet to show how much better a | spline with 20 nodes is vs a neural network with 1000 nodes for | approximating the sin function import numpy as np | from sklearn.neural_network import MLPRegressor from | scipy.interpolate import UnivariateSpline N = 10000 | X_train = 2*np.pi*np.random.uniform(size=N) Y_train = | np.sin(X_train) sin_NN = MLPRegressor(hidden_layer_sizes= | (1000,)).fit(X_train.reshape(N,1), Y_train) | spline_nodes = np.linspace(0,2*np.pi, 20, endpoint=True) | sin_spl = UnivariateSpline(spline_nodes, np.sin(spline_nodes), | s=0) X_test = np.linspace(0,2*np.pi, 5000, | endpoint=True) Y_test = np.sin(X_test) rmse_NN = | np.mean((Y_test-sin_NN.predict(X_test.reshape(-1,1)))**2) | rmse_spl = np.mean((Y_test-sin_spl(X_test))**2) | print("RMSE for NN approx: ", rmse_NN) print("RMSE for | spline approx: ", rmse_spl) >> RMSE for NN approx: | 0.00011776185865537907 >> RMSE for spline approx: | 9.540536500968638e-10 | srean wrote: | Indeed. I cannot upvote this enough New fanboys of DNN seem so | enamored by the universal approximation property and cite it at | the slightest provocation. There is no dearth of universal | approximators, that's not what makes DNN special. The special | thing is how does simple training procedures seem to find these | approximations that generalize well (or doesnt as shown by the | adversarial examples). | dahart wrote: | This whole example is a red herring. This isn't a splines | versus NNs issue at all, you're talking about the well known | fact that choice of basis affects the ability to fit, which has | nothing to do with whether you use a network. As a concrete | proof, since a spline is (usually) a polynomial function, it | can be defined as a linear network with as many layers as the | spline's polynomial order, in other words splines are a strict | subset of the functions you can build using neural networks. | You can also make a neural network out of spline neurons if you | want. And you can cherry pick lots of different functions that | work better for splines than other choices, and you can also | cherry pick functions that perform worse for splines than other | bases. Splines perform far worse on a periodic function of | arbitrary domain than a Fourier fit. Your example is contrived | because you artificially constrained the range to [0, 2pi]. | credit_guy wrote: | I'm sorry, but I have to disagree with you here. | | The "Universal Approximation Theorem" is not the point of | neural networks. People should stop mentioning it, or if they | do, they should state at the same time that there's nothing | special about NNs, that numerous classes of functions possess | the same property. | | Here's my own pitch for neural networks: NN's suck. Big time. | They suck in low dimensions and they suck in high dimensions. | But the curse of dimensionality is so formidable, that | everything sucks in high dimension. Neural networks just | happen to suck less than all other known methods. And because | they suck a bit less, there are applications where they are | useful, and they have no substitute. | abeppu wrote: | > Neural networks just happen to suck less than all other | known methods. | | Or, perhaps, the best demonstrated performance of NNs | exceeds the best demonstrated performance of other known | methods for many tasks. But ... the amount of compute, | investment in tooling, and attention that have been thrown | at deep learning in the past decade is at a scale where ... | do we actually know that other methods would perform worse | with the same resources? Is there some alternate timeline | where in 2012 someone figured out how to run MCMC for | bayesian non-parametrics over much larger datasets or | something, and the whole field of ML just tilted in a | different direction? | credit_guy wrote: | That's a very good observation. However, deep learning | didn't just get the share of the first mover. Before DL | was popular, Support Vector Machines used to be where all | the ML fun research was happening. And just out of | nowhere, Random Forests and XGBoost came and took the | crown if only for a fleeting moment. Gaussian Processes | always showed promise, but I'm not sure they delivered. | Deep Learning just delivered. I guess it's because of the | composability. But you are absolutely right that there's | no proof and now way of knowing right now if DL is the | best there possibly can be. | dahart wrote: | It doesn't matter if you disagree (BTW I don't know what | you disagree with specifically, and I did not mention the | Universal Approximation Theorem. It seems like you're | making some assumptions.) A polynomial spline is still a | subset of a neural network, so if you're right, all you're | demonstrating is that splines also suck at solving the same | problems that neural networks solve. The discrepancy | between the two here, again, has nothing to do with | networks and everything to do with your contrived example. | leoff wrote: | Since you are nitpicking, you could well use a sinusoid | activation function on the Neural Network, and reach an even | smaller loss value. | credit_guy wrote: | Not sure I understand your point. Do you want to use a bunch | of sine functions to approximate a sine function? What would | that show? | | Splines don't know anything about the nature of a function. | They approximate any function with piecewise polynomials. | | Maybe you are trying to say that the default activation | function (relu) in sklearn is not smooth. No problem, you can | add activation='tanh' | | inside the definition of the NN, and check the RMSE. Turns | out it's for some reason worse. | marginalia_nu wrote: | I assume they're referring to Fourier expansion. | | In general you can use a pretty wide set of functions to | approximate an arbitrary function. You can do it with | polynomials (Taylor expansion), and many others as long as | they form a Hilbert space. | | Producing a given function from a linear combination of | other functions isn't groundbreaking in the least. | gowld wrote: | Universality shows potential, not optimality. the article | covers this. | Veedrac wrote: | > >> RMSE for NN approx: 0.00011776185865537907 | | Error is approximately 0.0001 because `tol`, the parameter that | tells optimization to finish, is 0.0001. | | Set tol=0, and then beta_2=1-1e-15, epsilon=1e-30 to maximize | stability from the optimizer, and I got RMSE for the neural | network to go below 5e-7. | | This is all very academic because stochastic gradient descent | is a horrific tool to be using for this purpose. You aren't | wrong about that. | credit_guy wrote: | Fair enough, I wrote the snippet in 5 min and didn't check | the tolerance parameter. | | But with your improved choice of parameters, the NN is still | about 1000 times worse than the cubic spline, despite having | 50 times as many nodes. | Veedrac wrote: | I don't think these numbers are meaningful. It's not far | off from a degree-2 interpolation already, and I got a | hidden layer size of 20 to an error of 7e-5 by just letting | it optimise for longer and picking a seed that worked well, | which is basically the same error as a degree-1 | interpolation that gets 5e-5. | | Like sure the spline is doing better, but that's not why we | care, it's not like there's a general sense in which spline | interpolations are going to be better than the optimal fit | from a larger neural network, they're just a simpler, | faster, more numerically stable way of solving simpler | problems. An optimiser designed for 1D interpolation of | small neural networks, for all I know, might get extremely | accurate results. | grafs50 wrote: | With the top comments (now) all talking about the fact that this | universal approximation theorem doesn't really have much impact | in the real world. I wonder, is this interesting outside of | theory? Has this motivated any techniques that have created (or | may create) real-world, empirical results? Could it even? | amelius wrote: | I want to see it compute the Ackermann function. | musesum wrote: | > Ackermann function | | Perhaps quantum neurons? | cperciva wrote: | I want to see it compute the Busy Beaver function. | [deleted] | anothernewdude wrote: | Did you know a big enough hash-table can compute any function? | [deleted] | The_rationalist wrote: | alephnan wrote: | s/function/computable function | bmitc wrote: | This is called out deep into the article, but shouldn't it be | "neural nets can approximate any function"? | | Also, how does this relate to the traditional notation of | computability of functions? | t_mann wrote: | As someone who has taught this to CS students, just scrolling | through I have to say it looks like this has about 5x more text | it should have. This is a homework problem for second-year | students (literally, where I used to teach) that should take them | no more than a page to answer. | mjburgess wrote: | There are many caveats to this, esp. that this "fact" has nothing | to do with whether training a neural network on a dataset will be | useful. | | There is often no function to find in solving a problem, ie., | there is no mapping from ImageSpace -> DogCatSpace. Ie., most | things are genuine ambiguitites --- a stick in a water appears | bent, indistinguishably from an actually bent stick in some other | transparent fluid. | | Animals solve the problem of the "ambiguity of inference" by | being in the world and being able to experiment. Ie., taking the | stick out of the water. A neural network, in this sense, cannot | "take the stick out the water" -- it cannot resolve ambiguities. | So that it can "approximate functions" is neither necessary nor | sufficient for a useful learning system. | | More significantly, a NN is a very very bad approximator of many | functions. Consider approximating a trajectory _so that one can | then find an acceleration_ , ie., here we need f(x) to find | d2f/dx2 -- NN approximations are typically "OK" at the f(x) level | and really really crappy at the df/dx level, because the non- | linear functions NNs just glue together are only trainable if | theyre very rough. | | For these, and lots of other reasons, this theoretical approach | to learning is largely marketing none-sense. If you go out and | actually study the only known systems which learn effectively | (ie., animals), one does not find the need for universal | approximator theorems in explaining their capacities. | | These theorems _only_ show that NNs are, like many computational | statistical techniques, _sufficient_ for a wide class of "mere | approximation" problems that are only narrowly useful within the | whole field of learning-as-such. | benrbray wrote: | What techniques are used to estimate trajectories? Would | something like a Gaussian Process perform better? | mjburgess wrote: | Well NNs like GPs are "basically" non-parametric methods, in | the sense that one does not start with a known parameterised | statistical distribution that comes from domain expertise. | These are worst-case techniques when we dont have the option | to start with "the right answer", eg., in the case of large | datasets where we have no idea how some pixels distribute | over cat/dog images. | | In the case of a trajectory we would likey already know the | answer, in the form of just doing some physics. The role of | computational stats here then is to start with the known form | of the solution and find the specific parameters to fit it. | | Since we have physics, we can find the "perfect answer" to | the trajectory question with very few data points -- and take | as many derivatives as we like. | | Brute-force ML is often used when we dont have theories, | making it all the more dangerous; and alas, all the more | useful. We can get a 5% improvement on click-thru rate | without having any theory of human behavioural psychology --- | god knows then, what we are doing to human behaviour when we | implement this system. | | ` | aqme28 wrote: | If you're asking how to solve difficult differential | equations in general, we use numerical methods like finite | elements or finite differences | adgjlsfhk1 wrote: | that said, there has been some promising research on using | NNs for solving nonlinear pdes. | emmelaich wrote: | Can you just show the stick moving to the NN? | | Are you just denying information to the NN that is available to | the animal? | sdenton4 wrote: | You've got a kind of narrow view of the matter... The need for | interactive understanding is addressed in different ways by | reinforcement learning, GANs, and autoregressive recurrent | neutral networks. | | In the latter cases, the generative output of the network is a | kind of experiment, and back prop from the loss function | provides a route to improvement. | | I think it's an unfortunate historical accident that the field | of machine learning is so transfixed with classifiers. But | they're really not the only game in town. | fxtentacle wrote: | The article also conveniently glosses over the fact that all AI | calculations are limited in their maximum complexity by the | depth of the AI. For example, for x! the best a regular DL AI | can do is to memorize some values and interpolate between them. | curiousgal wrote: | > _There is often no function to find in solving a problem_ | | In finance NN can be used to calibrate models, i.e. generate | parameters to a model (function) that replicates existing data | (obsered prices/volatilities). | mjburgess wrote: | Well i dont think prices are functions of economic variables. | | Recall a function is `y = f(x)`, not `y1, y2, y3... = f(x)`. | | So what you're modelling is something like `E_hope[y] = f(x)` | where `E_hope` is "a hopeful expectation" that the mean of | the underlying ambiguous y1,..yn does "reliably" mean to a | unique `x`. | | This "hopeful expectation" is certainly more common than | there being any actual function connecting `y` to `x`, but i | think its often quite false too. Ie., even the expectation of | prices is genuinely ambiguous. | | To handle this we might ensemble models | `E_ensemble[E1_hope[y], ...En_hope[y]]`, but to repeat a | famous idiom in finance, this is very much "building | sandcastles in the sky". | | The idea that you can just "expect" (/statstics) your way out | of the need for experimentation is a dangerous superstition | which is at the heart of ML. It is impossible to simply | "model data", measurement produces genuine ambiguities which | can only be resolved by changing the world and seeing-what- | happens. There is no function to find. | thfuran wrote: | >Recall a function is `y = f(x)`, not `y1, y2, y3... = | f(x)`. | | y1,...,yn is a perfectly reasonable function output. | Functions don't have to produce scalars. | mjburgess wrote: | Functions have to resolve to one point in the output | domain, even if that point is multi-dim. | | Here, consider `y_houseprice = price(house data, economic | data, etc.)`. There isnt a unique house price in terms of | those variables. The real world observes many such prices | for the same value of those variables. | | An overly mathematical view of the world has obscured the | scientific method from our thinking here. Generally, | there arent actually functions from X to Y, and there | arent actually stable XY distributions over time. | | The world, as measured, is basically always ambigouous | and discontinuous. Data, as measurement, isnt the | foundation of our theory-building. We build theories by | changing the world; data comes in as a guiding light to | our theory building, not as the basis. | | ..wwhich is our direct causal interactions with our | environment, ie., its the actual stuff of our bodies and | the stuff of the world _as we change it_ | pedrosorio wrote: | > There isnt a unique house price in terms of those | variables | | > An overly mathematical view of the world has obscured | the scientific method from our thinking here | | Since it is understood that houseprice is not a function | of just three variables, the mathematical view | (statistical learning theory) commonly used when training | models, defines house price as a random variable. This | takes into account the uncertainty from all the unknown | factors they contribute to house prices. | | The distribution defining this random variable is a | function of the 3 input observations. Commonly, the | inputs are used to compute the mean, and the shape of the | distribution is fixed - a Gaussian, for example - but not | necessarily. | | Given observations of the 3 inputs, each observed | y_houseprice is just a sample from this random variable. | mjburgess wrote: | Well a random variable _is_ a function, from event space | to the real line. We return to a single measure by taking | its expectation. We dont model `Y = f(X)`. | | This doesn't play well with the universal theorem in the | article. NNs can only be said to model expectations of | random variables. | thfuran wrote: | >We return to a single measure by taking its expectation | | Only if you want to throw away most of the information in | your model. | [deleted] | mjburgess wrote: | Well, indeed. | | One then needs to explain how a NN being a "universal fn | approximator" helps at all in this context. | | One models RVs generativey using distributions (and so | on), the actual model (eg., of house prices) isnt a | function, it's often an infinity of them. | thfuran wrote: | >One then needs to explain how a NN being a "universal fn | approximator" helps at all in this context. | | Given that I can't tell why you don't think it does, I | don't think I can explain it to you. From the other | contexts you've talked about here, you seem to be | implying that the only thing that is potentially useful | is an AGI which either carries out or merely designs | experiments. But that's patently absurd. | mjburgess wrote: | I'm happy to hear the very narrow case on this. Can a NN | learn geometric Brownian motion? | Der_Einzige wrote: | Interpolation in GANs seems a lot like "being able to | experiment. Ie., taking the stick out of the water"... | tsimionescu wrote: | They are not taking the stick out of the water, because they | don't have hands and are not looking at a real stick. They | are being trained on a static data set, and a picture of a | stick isn't a stick. They can try to extrapolate all they | want, but they are fundamentally not going to be able to get | more information out of the data than there exists. | | And in a static photo of a bent stick, or a fluffy critter, | there simply isn't any information to tell whether this is a | bent stick or a stick in water; or whether it's a cat or a | dog. The intelligent response is not "I don't know", and it's | not "60% it's a cat, 40% it's a dog". It's "here is the set | of actions that need to be taken to create more data to be | able to settle the question". | | And creating that set of actions is completely different from | current approaches. No state of the art GAN can say "you need | to view the subject from a steeper angle, check for a | reflection to see if it's in water" or "poke it with a stick, | see if it meows or barks", because they don't have enough | information about the world in their train g sets to be able | to even know that these are possibilities. | tiborsaas wrote: | Did you really compare a single neural network with a real | physical agent (dog) with 5 senses, trillions of cells, a | complex inter-connected brain operating in the real world? This | is close to what I'd call unfair, or straw-man argument. | | This is like calling calculus useless, because it doesn't help | you pick one kind of milk versus another in a supermarket. | | Universal function approximation is a great tool if used | correctly. From what I understand, it comes in handy when our | puny human brain can't come up with an algorithm to produce | such function. Let's stick to image recognition/processing. Can | you write a function that recogizes a cat or a dog on a 64x64 | greyscale image? Or write a function to remove the background | from a picture of a person. Can you write a function to | generate a 3D depth map for a 2D picture? | | > These theorems only show that NNs are, like many | computational statistical techniques, sufficient for a wide | class of "mere approximation" problems that are only narrowly | useful within the whole field of learning-as-such. | | That's exactly what people are looking for in most cases, the | mere approximation can be good enough to consider the problem | solved. | [deleted] | mjburgess wrote: | I've spoken with no-end of people who think "universal fn | approximation" is some magic token to be played in the AI | debate -- people getting PhDs in ML no less. | | These are people really, in my experience, with no science | background -- c-sci programmers who dont have any conceptual | foundations in applied mathematics or science (outside of the | discrete math taught in csci) -- and who take these | properties as genuinely quite magical. | | "Intelligence" is, to them, just some function and if NNs can | approximate any fn, then presumably they'll be intelligent. | | They arent aware that in a sense, everything is a dynamical | function of space and time (say, stuff(x, t)) and to | instantite it, one requires the entire universe. | | In otherwords, csci people are not used to thinking about | applied mathematics in the sense of science (, implementation | of functions, dynamical functions, and so on). I think it is | important for this audience to demystify these properties. | | Being a universal fn approximator, in my view, is neither a | necessary nor sufficient property of any system | _implementing_ intelligence. It 's really a misdirection. | tiborsaas wrote: | I'm one of these persons with no formal training in any | scientific fields. I don't think any of what you assume | people like me think in your stereotype. I know it's not | magic, it's brute force problem solving. Once you are | capable of training systems like this, you get to solutions | which are really powerful. | | All you see is numbers and functions, and all I see is | problems being solved by these AI/ML methods and often in | quantifiably better ways than intelligent humans can. | | NN-s are the building blocks. Neurons in your brain are | dumb as well, it's the quality of the network that matters. | So instead of moving goalposts, what do you think is | required to implement intelligence? | mjburgess wrote: | Well, I'm not making a sterotype -- I'm offering an | explanation of the people i've met. It's very hard to | converse with people whose academic background is | discrete mathematics, in an essentially empirical domain | (that of modelling intelligence). | | There's a lot which is odd (and dubious) in AI/ML that | traces is origins to this peculilar situation of a | discipline (csci) in the "early phases" of modelling an | empirical phenomenon, without yet, an intra-disciplinary | theoretical support for it -- csci people take geometry | and classical physics to make video games. They dont yet | take the equivalent to make intelligent systems (which | would include stuff on learning in animals, humans; and | in my view, more applied math). | | In any case, to answer your question about | implementation, see https://news.ycombinator.com/threads? | id=mjburgess#30579711 | netizen-936824 wrote: | There is far more computational power in a single neuron | than any NN that I've heard of. | | One of the single approximated "neurons" in a AI/ML NN | doesn't even operate anywhere close to a real neuron. | Real neurons are oscillators which exhibit nonlinear | dynamic behaviors | beaconstudios wrote: | NNs are clever, but what they do is essentially reverse | programming. Rather than you write a function that | translates a -> b, you give it a -> b mappings and the | training writes the function. What GP was saying is | something that most programmers should already know: that | writing the function is usually the easy bit, and the | hard bit is defining and then modelling the problem; NNs | can't do that. | AlotOfReading wrote: | Just because you can brute force things doesn't mean it's | a practical way to solve certain problems. The set of all | C++ programs humans will ever write is finite and | therefore parseable with a regular language, but no one's | out there writing C++ compilers that way for obvious | reasons. That's the essence of what I think GP is getting | at. | | I don't know if NNs are sufficiently powerful escape that | argument because my formal understanding of "the world" | simply isn't good enough, but it's not obvious to me that | they are. | tiborsaas wrote: | I've seen a case when someone implemented an algorithm | after an existing AI was created. I can't remember | unfortunately, but the hand made version was better so | it's certainly a possibility. NN-s automate human problem | solving as it was demonstrated recently by DeepMind's | AlphaCode. | | It can be quite practical though as we failed to come up | with solutions that were previously out of reach before | deep neural networks (and massive increases in hardware | capabilities). | version_five wrote: | > There is often no function to find in solving a problem, ie., | there is no mapping from ImageSpace -> DogCatSpace. Ie., most | things are genuine ambiguitites --- a stick in a water appears | bent, indistinguishably from an actually bent stick in some | other transparent fluid. | | That's tangential imo. There is a function that maps from image | space to dog/cat/don't know space. A sentient being that get | the "don't know" can get more info to resolve the ambiguity (or | rephrase the question). A universal function approximator can | still make itself useful even if all it can do it say it | doesn't know. This is a question of problem setup. | | NNs are bad at extrapolating, including e.g. trivially to | periodic functions. This is a limitation if you thought they | could do that, but again a question of understanding what they | do. | | Hype, as you say, leads some people to believe NNs are magic, | leading to mismatched expectation. A universal interpolate-only | function approximator is still pretty useful though, just maybe | disappointing if you understood that to imply sentience | sdenton4 wrote: | "NNs are bad at extrapolating, including e.g. trivially to | periodic functions." | | WaveNet and its descendents are the obvious counter example | here. It is excellent at learning to generate periodic and | nearly periodic functions... | version_five wrote: | I know I've seen papers where they use sinusoidal | nonlinearities to learn periodic functions. I felt like | that's a bit of a hack though (not necessarily a bad | thing)- you're bringing domain knowledge in, which if you | allow makes it easy to extend to periodic functions. The | failure of a vanilla nn to learn periodicity is I think a | specific failure to extrapolate, which is the bigger | problem. | | I'll take a look at the architecture you mention. | | Edit: looked it up, I see it's an auto regressive model, | like pixel CNN in 1D. I do know pixelCNN, I hadn't really | considered the connection to periodic functions but I see | what you're saying. In a sense any AR model is | extrapolating, but not in the sense I mean: it has seen | training examples of the next point predicted from the last | points, it's not extending a relationship it learned in | training to something new. Anyway, thanks for pointing the | model out | rileyphone wrote: | I think the approach of [0] is closer to what's going on in | our brains, basically evolving symbolic equations of | partial derivatives until we get something good enough. | Really fascinating and succinct paper. | | 0. https://cdanfort.w3.uvm.edu/courses/237/schmidt- | lipson-2009.... | bannedbybros wrote: | mjburgess wrote: | Sure, we could see `animal = f(image)` as a partial function | and lift it into some total function space, `maybe_animal = | f(image)`. | | Is our reasoning here actually this total function though? We | really do always need to keep in mind that animal reasoning | will terminate, animals will act "guided by other reasoning", | and resume the original reasoning process. | | Action really messes up this neat "totalising option" for | partial functions. If i'm not sure if "fluffy" is a dog or a | cat, i might wait longer for it to move; I might throw | something at it; I might ask its owner. | | This isnt as simple as "don't know", since reasoning is kinda | time-bound and time-parameterised, my very measuring process | is sensitive to my own confidence in what-something-is. | | Is this _really_ "f(image) = dont-know"? I dont think so. | | I think it's more like, "judgement = body-brain-state(t, | reasoning-goal, {action policies}, ...large-number-of-other- | things)". | | This gets to my issue. In my view what animals are doing is | better described by a dynamical equation of state (like a | wavefunction), as the whole system is operating under its own | dynamical evolution (including, eg., what fluffy does in | response to your puzzlement). | | I dont see Dog|Cat|DontKnow as the answer here. I dont think | it's the actual total function which corresponds to our | judgement, though it probably is total -- we just end up with | "DontKnow" in all the cases actual intellkigence is | requireds.... the very think we're aiming to model. | shawnz wrote: | > If you go out and actually study the only known systems which | learn effectively (ie., animals), one does not find the need | for universal approximator theorems in explaining their | capacities. | | What's the explanation for their capacities, then? | mjburgess wrote: | direct, theory-laden, causal contact and pro-active | engagement with their environments... eg., taking the stick | out of the water. | | "data" (even from an animal's pov) is basically, by nature, | ambiguous. Measurement is an event in the world (eg., light | hitting the eye) which isn't somehow unambiguously | informative. To over come this problem, basically, animals | move stuff. | | The adaptable motor cortex of the most intelligent animals, | therefore, isnt something to be tacked-on to "intelligence", | it's the precondition for it. | | Glibly: tools before thoughts. Reason needs content to | operate on, the content of our thoughts is _built_ via our | (sensory-)motor systems. | | The idea that we need ever more theoretically powerful models | of reasoning here is a misdirection -- it misses that the | heart of everything we know arrives via some form of repeated | experiment. | | Every more automated "pure reasoning" either via statistics | or symbolically, is always just a means of juicing the data | we provide the system. Useful as a technology, but not as a | genuine learning system. It will never have the means of | resolving the many ambiguities within data itself. | | In the case, for example, of NLP -- the structure of 1 | trillion documents will never enable a machine to answer the | question "what do you like about what i'm wearing?" -- | because (1) the machine isnt here with me; (2) i am asking | for its personal judgement, not a summary of a trillion | documents; and (3) that summary of those trillion documents | has to be unique, but the question has no "right answer". | | Whilst computer science has the driving seat over what | "intelligence" is, we will forever be stuck with this | incredibly diminished view of our own capacities and the size | of the technological challenge. The goal isnt to sift through | everything we have already done and "take a mean", the goal | is to produce a system which could have done "everything we | have done" without us. | gowld wrote: | Experiments are data collection. It's the iinput to the | thinking process. | bbqbbqbbq wrote: | shawnz wrote: | That clearly doesn't explain how you get from the | biological system to intelligence though. NNs can be | imagined as a (highly simplistic) form of repeated | experiment too. | mjburgess wrote: | Intelligence starts when the internal physical structure | of a system is "dynamically reflective" of its external | environment in a way which is stable over time | (basically, "complex hysteresis"). You get this with a | hard drive (+CPU, etc.) sure. I'd call this sort of | minimal intelligence merely "reactive". | | You get, let's say "adaptive intelligence" when the | physiological structure being changed adapts the system | so that it is able to interact with its environment more | (eg., it can move differently). | | To get to more advanced forms we need an explicit | reasoning process which can represent this physical state | to the system itself to engage in inference. Let's say | "cognitive intelligence". | | We get typical mammalian intelligence when the broader | physiological structure (in particular the sensory-motor | structure) of the system is actually guided by this | explicit reasoning process. (Eg., a cyclist grows their | muscles differently by reasoning-when-cycling). Let's | call this "skill intelligence". | | You get human intelligence when the explicit reasoning | process becomes communicable, ie., when interior | representations can be shared without the physical | activity of acquiring those representations. Human | intelligence is really "outside-in" in a very important | way, which AI today also neglects -- it took 100bn dead | apes to write a book, not one. Let's call this "socio- | symbolic intelligence". | | What we have today is really just systems of "reactive | intelligence" with weird frankesianian organs attached to | them "look at alex turn the lights off!!!!". Alexa isnt | turning the lights off in the manner of the "socio- | symbolc intelligence" we attribute to alexa naively (and | delusionally!). | | Alexa is a reactive system which has something of socio- | symbolic significance ( _to us_ , not it!) glued on. | Alexa does not intend to turn the lights off, and we're | not communicating our intent to her. She's a harddrive | with a SATA cable to a lightswitch. | shawnz wrote: | It seems to me like a "reactive intelligence" is | basically equivalent to an "adaptive intelligence" which | just hasn't begun making use of all of its possible | outputs yet. Obviously even though we adapt to our | environment, there are still ultimate limits to how far | we can adapt. | mjburgess wrote: | The reason a dolphin isnt as smart as an ape, is that | it's a tube of meat in the ocean; and an ape is a pianist | up a tree. | | I see non-trivial forms of intelligence as largely as | symptoms of physiology. Even the human brain in a dolphin | would be dumb, indeed that's basically just what a | dolphin is. | | There is something absolutely remarkable in a thought | _about_ something, moving my hands to type; and my hands | actually typing. Personally, I think 90% of that miracle | is organic -- it is in the ability of our ceullar | microstructure to adapt quickly; and our macro-structure | to adapt over time. | | Either way, people who intend to build intelligent | systems have a task ahead of them. Building a system | which can really use tools _it hasnt yet invented_ is, in | my view, a problem of materials science more than it is | of discrete mathematics. | yazanobeidi wrote: | I like what you wrote here and how you think. | | However I have to make one remark. | | Schopenhauer would say that Alexa does in fact have will | to turn the lights off. A burning will, the same will | within yourself and everything that is not idea. That it | is your word that sets off an irreversible causal | sequence of events leading to the turning off of the | lights. Schopenhauer would ascribe his "Principle of | Sufficient Reason" as the reason for happening. It is not | that Alexa chooses to obey, but by the causal chain | enforced by physics and more leaves the will of the | universe no choice but to turn off the lights. Same | reason why the ball eventually falls down when thrown up. | I believe this is the metaphor Schopenhauer uses in his | World as Will and Idea. | abeppu wrote: | I feel like bringing David Marr's "levels of analysis" | into the discussion is useful here, at least in very | loose terms. | | > explicit reasoning process which can represent this | physical state to the system itself to engage in | inference | | > when interior representations can be shared without the | physical activity of acquiring those representations | | Roughly, you're talking at the 'computational level' or | perhaps above. You're describing qualities of the | computation that an agent must be doing. You don't | descend to the algorithmic level, which is where the | 'universality theorem' discussion is taking place. Which | is not to say that any of what you've said is wrong, but | to someone like the parent asking "_how_ you get from the | biological system to intelligence though" (emphasis | mine), I think it's basically a non-answer. | mjburgess wrote: | Well, my answer there will alarm many. Broadly, it's | whatever algorithm(s) you'd call biochemistry. I think | the self-replicating adaptive properties of organic | stuff, is the heart of how we get beyond what the mere | hysteresis of hard-drives can do. We require cells, and | their organic depth. | | We dont often describe reality with algorithms in the | sciences, if reality admits a general algorithmic | description, it is surely beyond any measurable level. So | I dont think the answer to the problem of intelligence | will require computer science till much later in the | game; if it is even possible to actually artificially | create it. | | Whatever "algorithm" would comprehensively describe the | relevant properties of cells, even if it could be written | down, will never be implemented "from spec". One may as | well provide the algorithm for "a neutron star" and | expect playing around with sand will make one. | abeppu wrote: | I think this is another non-answer. | | Biochemistry is broad and does really diverse things, and | if your answer to "how does a mammalian brain allow it to | reason through interactions with its physical | environment" is "biochemistry", and that's also | presumably the answer to "why is that pond scum green?" | and "why are fingernails hard?", then it fails to be an | explanation of any sort. | | Similarly, if someone asks "why is your program dying on | a division-by-zero error", it's not an explanation to say | "well, that's just a matter of how the program executes | when provided those particular inputs". | | What _specifically_ about the biochemistry of our nervous | systems allows us to solve problems, or communicate, as | versus just metabolize sugar? | [deleted] | mjburgess wrote: | It's about self-replication, adaption and "scale-free | properties". | | Consider that touch on the surface of my skin, which can | be a few atoms of some object brushing a single cell -- | that somehow "recuses up" the organic structure of my | body (cell, tissue, organ, ...) both adapting it and | coming to be "symbolically objectified" in my reasoning | as "a scratch". | | The relevant properties here are those that enable | similar kinds of adaption (and physiological response) | _at all relevant scales_ from the cell to the organ to | the whole-body. | | I think cells-oragans-bodys are implementations of | "scale-free adaption algorithms" (if you want to put it | in those terms), which enable implementation of "higher- | order intelligence algorithms". | | If you want much much more than this, then even if I had | the answer, it wouldnt be comment-sized, i'd be a | textbook. But of course, no one has that textbook, or | else we wouldnt be talking about this. | | I think if you see cells as extremely self-reorganizing | systems, and bodies as "recursive scale-free" | compositions of "self-reorganizing adaptive systems", | then you get somewhere towards the kinds of properties | i'm talking about. | | I think my ability to type _because_ i can think, is a | matter of that "organic recursion" from the sub-cellular | to the whole-body. | slibhb wrote: | > Animals solve the problem of the "ambiguity of inference" by | being in the world and being able to experiment. Ie., taking | the stick out of the water. A neural network, in this sense, | cannot "take the stick out the water" -- it cannot resolve | ambiguities. | | Couldn't training data include a video of someone taking a | stick out of the water? | ddingus wrote: | Then we have no ambiguity. The data is inclusive. | | And the cost is a very seriously larger problem space to | classify. | smolder wrote: | Yeah, all you need to do is show you can build a NAND gate with a | neural network and every other logical network follows from that. | I remember doing that project in a machine learning class in | 2010. | kazinator wrote: | Really? Can a neural compute the function halts(N, I): the neural | net N will halt on input I, for any neural net N and input I? | advisedwang wrote: | The article says: | | > The second caveat is that the class of functions which can be | approximated in the way described are the continuous functions. | If a function is discontinuous, i.e., makes sudden, sharp | jumps, then it won't in general be possible to approximate | using a neural net. This is not surprising, since our neural | networks compute continuous functions of their input. However, | even if the function we'd really like to compute is | discontinuous, it's often the case that a continuous | approximation is good enough. If that's so, then we can use a | neural network. In practice, this is not usually an important | limitation. | kolbe wrote: | So can most linear combinations of functions set to the nth | power. | __MatrixMan__ wrote: | I think the author might be overlooking just how weird something | can be while still being a function. | | Let f(x):R->{1,0} be such that f(x)=1 if x is rational and 0 | otherwise. | Banana699 wrote: | "Visual proof" pretty much gave away the fact that it was going | to be a non-rigorous reasoning process based on drawing an | arbitary plot, once you draw a plot you're already assuming a | lot about its underlying function. The title isn't very | misleading. | vecter wrote: | He says continuous function | jjgreen wrote: | He also says in the first sentence "One of the most striking | facts about neural networks is that they can compute any | function at all", that later caveat is incompatible, most | function are not continuous. | laichzeit0 wrote: | Ok so what about the Cantor function? Can it learn that? | aaaaaaaaaaab wrote: | That's a pretty useless "proof" which gives no insight into how | neural networks learn in practice. The author takes a sigmoid | neuron, tweaks its parameters to the extreme so that it looks | like a step function, then concludes that since a linear | combination of step functions can approximate anything, so do | neural networks. Bravo. | r-zip wrote: | Did you ever consider that explaining how deep nets "learn in | practice" is not the point? | anothernewdude wrote: | If it's not about learning this result is useless. A lookup | table can compute any function, but who cares? | mdp2021 wrote: | > _who cares_ | | The people who need to build some (sophisticated kind of) | lookup table for a function of unavailable details through | automation. | voldacar wrote: | This "proof" is pretty sketchy. I think he means every | _differentiable_ function? Because I fail to see how you can make | a neural network evaluate the indicator function of the rationals | bjourne wrote: | "The second caveat is that the class of functions which can be | approximated in the way described are the continuous functions. | If a function is discontinuous, i.e., makes sudden, sharp | jumps, then it won't in general be possible to approximate | using a neural net. This is not surprising, since our neural | networks compute continuous functions of their input. However, | even if the function we'd really like to compute is | discontinuous, it's often the case that a continuous | approximation is good enough. If that's so, then we can use a | neural network. In practice, this is not usually an important | limitation." | voldacar wrote: | I guess I didn't see that. But he's still using _continuous_ | where the correct term would be _differentiable_ | kevinventullo wrote: | You can uniformly approximate any continuous function on a | compact domain with a differentiable/smooth function. | bjourne wrote: | Continuous functions aren't necessarily differentiable. | voldacar wrote: | yes, that's my point | W0lf wrote: | Given that any stock is a function over time as well, there | should theoretically exist a neural net that can approximate the | stock price for the future? This reasoning is obviously wrong, | what is my exact error of thought though? | arghwhat wrote: | The error is that stock price is not a function over time, but | instantaneous demand which we record over time. That demand is | a function over an undefined number of variables. | ddingus wrote: | We can find a function to express past stock prices. | | There isn't one for the future, unless said future is somehow | predetermined? | | Is it, given enough input data? | | Does this discussion then distill down to philosophy? | | Do living beings have agency, or are they simply very complex | NNs? | | How one answers that speaks to consciousness as much as it | does the prospect of a predictive stock price model. | visarga wrote: | Your input features are incomplete and mixed with noise. | | The value of a dice is also a function over time. Can we learn | this function with a neural net? No, because our features don't | include the nitty gritty details of each throw so it's | essentially random. | paskozdilar wrote: | > what is my exact error of thought though? | | There isn't one, you're just overestimating the value of | existence-propositions. | | In practice, knowing that something exists is not a very useful | result - it is often more useful to know that something does | NOT exist (e.g. solution to the halting problem). | amalcon wrote: | Your mistake is that you left off "given the right inputs". | There are a lot of inputs to stock prices that are unlikely to | be readily available to your function. | max__d wrote: | With the same type of reasoning, we could plot whatever output | our brain gives and there will be some type of neural network | that can predict what we'll think/see/feel in the future. What | you said and the thing i just said were both ideas i had when | starting learning how ai works, sadly it's something we can | still not reach.at least for now | tluyben2 wrote: | If I understand you correctly, we are very far away from that | example as that is AGI and then some. You will not see that | in your lifetime so 'we can still' seems an interesting | (overly optimistic?) take on it. | laichzeit0 wrote: | AGI, since it lacks a technical/mathematical definition, | can be anything. It's mere philosophy at this point, | actually even vaguer than most philosophical problems. | posterboy wrote: | Although I share your sentiment in general, I would | presume that @tluyben's take is fairly true to the | broader philosophical view. The ritique of this view | being at least as weak as the views on intelligence per | se is a drop in the ocean really. | | Implying, there is a wealth of thought devoted to | inteligence! That fact is actually proving the conjecture | in a nicely constructive way by itself, that we are | thoughtful indeed, if only you believe this axiomatically | like. The quintessential theorem was distiled by | Descartes, of course, wherefore he is remembered. | tluyben2 wrote: | I meant it to mean, indeed in a vague way, what we call | human intelligence or beyond; the parent says to make a | neural network that can predict what someone will | think/feel in the future, which seems the same or at | least indistinguishable from the subject's human | intelligence as it will result in the same outputs. So to | create the network implied by the parent, we would have | to a) be able to make networks of that (unknown) | complexity and b) 'copy' , or rather make it learn from | the outputs, the current 'state' of the subject's brain | in it. That is incredibly far removed from anything | cutting edge we know how to do. If it is at all possible. | | So I was just surprised by their use of language as it | seems to imply parent thought we would be closer to or | there already with our developments of AI tech. | alexchamberlain wrote: | Stock prices are not well modelled as continuous functions- the | prices you see are generally trades (discrete function) and the | price may or may not have been available to you at the volume | you wanted (there are more variables than time). | nnq wrote: | besides karelp's sister comment, there's also the "obvious" | fact that _stock price is not a function of time, it 's not | P(t), it's a function of time and the entire f universe that | also evolves through time, more like P(t, U(t, ....))_ ....you | can simplify things by _assuming the laws of physics are | deterministic and you only need one instance of the state of | the universe, U, so you 'd have P(t, U)_ | | ...now if you don't explicitly represent U as a parameter, | you'll have it implicit in the function. So your "neural | network" contains _the entire state of the freakin universe | (!!)_. | | Ergo, contingent on your stance on theologic immanence vs. | transcendence, what you'd call "neural network approximation of | the stock's price function" is probably quite close to what | other call... God (!). | | (Now, if relativity as we know it is right, you might get aways | with a "smaller slice of U" - lear about "light cone". And to | phrase this in karelp's explanation context: you'd need to know | U to know which of the practically infinitely many such neural | networks to pick. The core of (artificial) intelligence is not | neural networks in themselves, it's _learning_ , the NN is a | quite boring computational structure, but you can implement | tractable learning strategies for it, both in code, and in | living cells as evolution has shown...) | DennisP wrote: | And you'd have to know the state of U to infinite precision. | Which makes me wonder whether neural nets have any hope with | a simple chaotic function. Maybe they do but just in the | short term, like predicting the weather. | ComradePhil wrote: | Theoritically, there exists a model that predicts all future | stock prices EXACTLY at any given time in the future, as long | as the results are completely isolated from all market | participants (i.e. the knowledge of the result is COMPLETELY | isolated from the market). Here is how you can theoritically | prove it exists: | | Train a model today so that it is overfitting for a given | stock. It would predict everything very accurately upto today. | The ONLY way to make sure that the results are completely | isolated from the market is to not make the result available to | ANYONE (how do you know that an isolated human observer is not | leaking data with some unknown phenomena... say quantum | entanglement with particles in other people's brains, for | example). So, the ONLY way to test the models is back-testing. | | You can extend that to saying that for any given point in the | future (say, this is a reference point), there will be an | overtrained model which will backtest perfectly i.e. the | theoritical model that works at any time during the past to | predict the exact stock price in the future upto the point of | reference. | fkfkno wrote: | p was a great movie! | | https://www.youtube.com/watch?v=ShdmErv5jvs | posterboy wrote: | Contrary to the recent top comment on this, which fails to show | that such a net existing could be no coincidence, I guess, the | answer to your problem might be deeply physical and information | theoretic, as soon as you speak of _time_. Simply speaking, any | model is _good enough_ if the approximation is tolerably | accurate. In that sense, crude nets as well as expert systems | that trigger off clear signals and ample evidence may already | exist. | | In particular, the way the stockmarkets are distributed the | function of time is likely relativistic and every participant | is acting under incomplete information even in the infinite | limit. | | Also, you have to be cautious what any _function_ in this | context really means, as I imagine it means differentiable | functions (after somebody mentioned the Ackermann function, | which is not anywhere differentiable). | Bootvis wrote: | Doesn't seem wrong to me, the tricky part is to find this | network and convince yourself that it is indeed predicts | correctly over the period of interest. | karelp wrote: | Your reasoning is not wrong, there is a neural net that | approximates future stock price. The problem is we don't know | which one :) | hedora wrote: | Sure we do. Partition your neurons into a few billion | independent networks, embed them on a spinning globe of | mostly molten rock, and put each one in a leaky bag of mostly | water. | | Wait long enough, and one leaky bag will emit the number | "42". If you get the initial conditions just right (and | quantum nondeterminism isn't really a thing), then you'll | also get a good approximation to the stock market. | vimacs2 wrote: | So it's somewhat akin to the Library of Babel but instead of | the set of all possible books, it's the set of all possible | functions :p | hedora wrote: | Those are equivalent as long as you allow for infinite | length books. | Banana699 wrote: | No need for infinite-length books to encode all possible | functions, as there are notations to express infinity in | finite space (e.g. programs, encoding an infinity of | behaviour in a finite number of instructions). | | The library of babel contain every possible finite | description of every possible function. | dzaima wrote: | And knowing which one it is will probably influence which one | it should be, in a halting-problem-esque way. | pawelduda wrote: | It would be the same as predicting the future, which is not | possible using past performance | callesgg wrote: | It is not wrong. | | It is just that the neural network would have to compute a | model of the entire world or even the universe on an atomic | scale. It would be computationally unfeasible but not | theoretically impossible. | | It is theoretically possible that the universe we live is | already being computed on a neural network in some other | external universe. | gowld wrote: | This is a long written argument with some animations, not a | visual proof. | hprotagonist wrote: | any "well behaved" function. | qwerty1793 wrote: | A number of assumptions seem to be missing from this article. | Since the author is using the sigmoid function which is smooth, | this argument actually only applies to approximating smooth | functions. That is, you dont just need f to be continuous, you | need all of its derivatives to exist and all be continuous. | Also, since we are only able to have finitely many neurons, we | need to be able to approximate f using step functions with | definitely many pieces. So this argument can only be used if f | is constant outside of a compact region. | hprotagonist wrote: | at which point, a fourier transform is a hell of a lot | cheaper ;) | joppy wrote: | Why does the function you are approximating need to be | smooth? From the paper cited in the article, all you need is | for f to be continuous on a compact subset of R^n. | lisper wrote: | > a more precise statement of the universality theorem is that | neural networks with a single hidden layer can be used to | approximate any continuous function to any desired precision. | | That is a _very_ different claim than being able to compute any | function. | moffkalast wrote: | I was about to feed P = NP into GPT-3, _sigh_ ___________________________________________________________________ (page generated 2022-03-06 23:00 UTC)