[HN Gopher] A visual proof that neural nets can compute any func...
       ___________________________________________________________________
        
       A visual proof that neural nets can compute any function
        
       Author : graderjs
       Score  : 216 points
       Date   : 2022-03-06 11:56 UTC (11 hours ago)
        
 (HTM) web link (neuralnetworksanddeeplearning.com)
 (TXT) w3m dump (neuralnetworksanddeeplearning.com)
        
       | kmod wrote:
       | I think these sorts of arguments are not great because they
       | confuse "limiting behavior" with "behavior at the limit". Yes if
       | you are able to construct an infinite-sized MLP it can exactly
       | replicate a given function, and you can construct a sequence of
       | MLPs that in some sense converge to this infinite behavior. But
       | in other measures the approximation might be infinitely bad and
       | never get better unless the net is truly infinite.
       | 
       | For an example, consider approximating the identity function
       | [f(x) = x] with a sigmoid-activation MLP. For any finite size of
       | the net, the output will have a minimum and maximum value. One
       | can change the parameters of the net to increase the range of the
       | output, but at no point is the output range infinite. So even
       | though you can construct a sequence of MLPs that in the limit in
       | some sense converges to the identity function, in some sense it
       | never does.
       | 
       | The same kind of thinking that leads to the conclusion "neural
       | nets are universal approximators" would support the existence of
       | perpetual motion machines; check out the "ellipsoid paradox" for
       | more info.
        
       | j16sdiz wrote:
       | The author is taking non-linear neurons. This is, IMO, cheating.
        
         | JustFinishedBSG wrote:
         | It's not cheating considering it's not even possible otherwise.
        
           | [deleted]
        
           | j16sdiz wrote:
           | This is obviously true for non-linear neurons. Who need a
           | proof for that?
        
             | civilized wrote:
             | Why do you think it's obvious? Can you spell it out?
             | 
             | I hope it's more than "presumably, non-linear neurons can
             | approximate any non-linear function since they both have
             | non-linear in the name".
        
               | programmer_dude wrote:
               | All functions can be approximated by the Fourier
               | transform. Look up the equation to see why it applies
               | here (Hint: it is an integral of f(t)e^(t)).
        
               | civilized wrote:
               | This is about as far from a proof as we are from the
               | Andromeda Galaxy.
        
               | Dylan16807 wrote:
               | The really simple proof I'd use is:
               | 
               | 1. A function can be approximately implemented as a
               | lookup table.
               | 
               | 2. It's trivial to make a neural network act like a
               | lookup table.
               | 
               | Which seems to resemble the article but it's much
               | simpler.
               | 
               | Point 2 assumes the neurons are normal non-linear ones.
               | I'm not saying that's cheating, but I do agree with it
               | being pretty obvious, at least from the right angle.
        
               | civilized wrote:
               | If you fleshed out the "trivial" point 2 as a proof, I
               | think the result would be essentially the same as the
               | article.
               | 
               | The only way you can make it substantially simpler is if
               | you use a neuron whose nonlinearity makes it essentially
               | a restatement of another result. For example, if the
               | neurons are basically just Haar wavelets.
        
               | Dylan16807 wrote:
               | "Divide x into a bunch of buckets.
               | 
               | Make two neurons tied to Input that activate very sharply
               | at the bottom and top edge of each bucket.
               | 
               | Use those to make a neuron that activates when Input is
               | in the bucket.
               | 
               | Weight it so it adds f(x) to Output."
               | 
               | That's over 100 times shorter than the article. The
               | method isn't as elegant since it needs two internal
               | layers but I think it's pretty clear.
               | 
               | Is it wrong to say that the logical leap from "a neuron
               | can go from 0 to 1 at a specific input value" to "neurons
               | can make a lookup table" is trivial? Oh well.
               | 
               | (With "go from 0 to 1 at a specific input value" being
               | the nonlinear part.)
        
               | civilized wrote:
               | I encourage you to search for "lookup table" in the
               | article.
        
               | Dylan16807 wrote:
               | Why? Yes, it says that.
               | 
               | It's also 6000 words long.
               | 
               | I'm saying it's not that hard.
               | 
               | I'm not saying the article is wrong or anything, I'm
               | saying you can get to the same result MUCH faster.
               | 
               | "You can turn a neural net into a lookup table" should be
               | easily understood by anyone that knows both of those
               | concepts.
               | 
               | Edit: Like, isn't triggering specific outputs on specific
               | input conditions the first thing that's usually shown
               | about neural nets? If not a full lookup table, that's at
               | least 90% of one and you just need to combine the
               | outputs.
        
           | anothernewdude wrote:
           | My proof that you can compute any function with a single
           | neuron:
           | 
           | 1. Use the function as the activation function.
        
         | joppy wrote:
         | If you take linear neurons, then the whole network is just some
         | linear (or affine function) of its inputs, and hence the
         | universal approximation fails (not every continuous function
         | can be uniformly approximated by linear functions).
        
         | advisedwang wrote:
         | Why? Non-linear activation functions are common and easy?
        
       | nautilius wrote:
       | I don't understand all of the criticism here. This is Ch. 4 in a
       | basic intro to neural networks. The author provides a well
       | written, intuitive, and concise demonstration of how sigmoids can
       | be pieced together to approximate functions, in the layman's
       | sense of the word. It helped me build intuition when I came
       | across this maybe 5 (?) years ago. There are other good
       | 'Chapters' about vanishing gradient or backprop.
       | 
       | The criticism here is mostly about things that aren't even the
       | topic of the article: 'it cannot compute non-computable
       | functions', 'it cannot predict tomorrow's stock price', 'splines
       | are more efficient', 'it cannot predict how a stick looks bent in
       | water'.
       | 
       | It's like saying 'I read "Zen and the art of motorcycle
       | maintenance, twice, and still don't know how to adjust spark
       | plugs. Stupid book"'.
        
         | GTP wrote:
         | The criticism is that is over-representing neural networks'
         | capabilities: saying that they can compute any function is a
         | very big and incorrect claim, IMO it should at least mention
         | that non-computable functions are a thing and of course a
         | neural network can't have better capabilities than the machine
         | that is running it. This is especially because there are people
         | out there thinking that machine learning is able to solve any
         | possible problem, if one of those people comes across this
         | proof they'll think: "Neural netwoks really can solve any
         | conceivable problem! Look, there's even a mathematical proof of
         | this!"
        
           | renewiltord wrote:
           | It's funny, but this is how I felt about my son's
           | kindergarten. They taught him about "numbers" but instead
           | they only used Natural numbers. The least they could have
           | done is mention at least the one-point compactification of
           | that space.
           | 
           | Fortunately, I was able to intervene before he was
           | permanently damaged by the class. He's still struggling with
           | showing that it is homeomorphic to the subspace of R formed
           | by {1/n for all n in R that are in N} union {0} and I blame
           | today's pedagogy for that.
        
             | GTP wrote:
             | The context is different, the problem here is the hype that
             | goes in a very specific direction (i.e. you can do anything
             | with ML) as I briefly explained in another comment
        
               | nautilius wrote:
               | I feel "Numbers" are pretty hyped-up!
        
           | nautilius wrote:
           | As I wrote, 'function in the laymen sense of the word'. And
           | no one actually reading this can, without malice,
           | misunderstand this: the author uses a nice and smooth y=f(x)
           | as an example, and shows that it can be approximated with
           | sigmoids. Nothing more, nothing less. And he does a good job
           | showing this.
        
             | GTP wrote:
             | You say no one can misunderstand what the author wrote
             | without malice, but there is a saying that reads "you
             | should not attribute to malice what can be attributed to
             | ignorance". Especially among people outside of CS, there is
             | currently a lot of hype regarding ML and AI in general, if
             | one of those people looks around to check if a neural
             | network can really do anything and founds this, what would
             | they understand?
        
               | nautilius wrote:
               | You're telling me you were confused by what the author
               | could possible mean with his nice and smooth y=f(x) and
               | that you misunderstood him that really he was implying
               | that GAI is just around the corner? The whole discussion
               | here is one long list of "Gotcha!", missing the point and
               | the context of the article completely.
               | 
               | You think these non-CS people you evoke will be too dense
               | to understand that the author is just approximating a
               | simple function, and then will leverage their non-CS
               | background and immediately conclude that ANN can 'solve'
               | Cantor-dust, non-computability and the halting problem?
               | That's a very specific set of background your non-CS
               | people will have to have to fall for that.
               | 
               | [Edit:]
               | 
               | What do you think they'll make of "Multilayer feedforward
               | networks are universal approximators"?
               | 
               | ("This paper rigorously establishes that standard
               | multilayer feedforward networks with as few as one hidden
               | layer using arbitrary squashing functions are capable of
               | approximating any Borel measurable function from one
               | finite dimensional space to another to any desired degree
               | of accuracy, provided sufficiently many hidden units are
               | available. In this sense, multilayer feedforward networks
               | are a class of universal approximators.")
        
       | charcircuit wrote:
       | What about a function like f(x) = sin(x)? I feel like when x gets
       | big the error will start to increase.
        
         | CJefferson wrote:
         | You are right. The argument is that it can approximate sin(x)
         | over a compact interval, like [0,1].
         | 
         | You answer to that might be "I could approximate over [0,1]
         | with just a lookup table, where I split the input range into n
         | equal sized pieces for increasing values of n", and you'd be
         | right -- the "proof" is basically just doing that.
         | 
         | It's one of those things which is nice to show as a basic
         | theory thing (there are approximation methods which can never
         | simulate certain functions), but it's not really of any real
         | value.
        
         | qwerty1793 wrote:
         | This argument only applies to functions on a compact domain. So
         | we should only consider trying to approximate sin(x) when x is
         | in [0, 1], for example.
        
           | [deleted]
        
       | kevinventullo wrote:
       | I see a lot of commenters up in arms about Universal
       | Approximation for NN's, and I think the issue is that it's often
       | framed as a _superpower_ rather than _table stakes_ for any kind
       | of general purpose algorithm.
       | 
       | I posit that any modeling technique which does _not_ have the
       | universal approximation property will be guaranteed to fail on
       | large classes of problems no matter how much elbow grease (say
       | feature engineering) one puts into it. That is, UA is a
       | _necessary_ but not _sufficient_ condition for a modeling
       | technique to be fully general (i.e. could form the basis of
       | whatever AGI is).
        
         | DiggyJohnson wrote:
         | Really well said. Is there a term or concept in the AI
         | literature++ that captures this point/conjecture?
        
       | Kalanos wrote:
       | it can't learn exponents. it can only multiply.
       | 
       | there's also a difference between memorizing part of a line, and
       | being able to extrapolate it.
        
         | iamcurious wrote:
         | I find this very interesting, can you expand or provide links?
        
       | marsven_422 wrote:
        
       | Jenz wrote:
       | "Any function" is a big wide claim. Can someone fill me in on
       | what's required of these functions? Can a neural nets for example
       | compute non-continuous functions like f(x) = [x is rational]?
        
         | advisedwang wrote:
         | The article says:
         | 
         | > The second caveat is that the class of functions which can be
         | approximated in the way described are the continuous functions.
         | If a function is discontinuous, i.e., makes sudden, sharp
         | jumps, then it won't in general be possible to approximate
         | using a neural net. This is not surprising, since our neural
         | networks compute continuous functions of their input. However,
         | even if the function we'd really like to compute is
         | discontinuous, it's often the case that a continuous
         | approximation is good enough. If that's so, then we can use a
         | neural network. In practice, this is not usually an important
         | limitation.
        
       | FpUser wrote:
       | >"No matter what the function, there is guaranteed to be a neural
       | network so that for every possible input, x, the value f(x) (or
       | some close approximation) is output from the network"
       | 
       | With "or some close approximation" being a key I fail to
       | understand why is it not obvious.
        
       | The_rationalist wrote:
        
       | credit_guy wrote:
       | This often cited fact is a red herring. Lots of things can
       | compute (or rather approximate) any functions. Piece-wise
       | constant functions obviously can approximate anything, but
       | nobody's giddy about using piece-wise constant functions for any
       | numerical purpose (if they do, and they often do, they don't
       | point with pride of their new application of the piece-wise
       | constant functions "universal approximation theorem").
       | Polynomials, trigonometric polynomials, splines (i.e piecewise
       | polynomials), radial basis functions, and on and on.
       | 
       | Just put neural networks to a test against splines, and see how
       | they fare. Take your favorite function, let's say sin(x) and try
       | to approximate it with a neural net with 1000 nodes, or with
       | splines with 20 nodes. You don't stand a chance to match the
       | quality of the spline approximation.
       | 
       | Edit: here's a short python snippet to show how much better a
       | spline with 20 nodes is vs a neural network with 1000 nodes for
       | approximating the sin function                 import numpy as np
       | from sklearn.neural_network import MLPRegressor       from
       | scipy.interpolate import UnivariateSpline            N = 10000
       | X_train = 2*np.pi*np.random.uniform(size=N)       Y_train =
       | np.sin(X_train)       sin_NN = MLPRegressor(hidden_layer_sizes=
       | (1000,)).fit(X_train.reshape(N,1), Y_train)
       | spline_nodes = np.linspace(0,2*np.pi, 20, endpoint=True)
       | sin_spl = UnivariateSpline(spline_nodes, np.sin(spline_nodes),
       | s=0)            X_test = np.linspace(0,2*np.pi, 5000,
       | endpoint=True)       Y_test = np.sin(X_test)       rmse_NN  =
       | np.mean((Y_test-sin_NN.predict(X_test.reshape(-1,1)))**2)
       | rmse_spl = np.mean((Y_test-sin_spl(X_test))**2)
       | print("RMSE for NN approx: ", rmse_NN)       print("RMSE for
       | spline approx: ", rmse_spl)            >> RMSE for NN approx:
       | 0.00011776185865537907       >> RMSE for spline approx:
       | 9.540536500968638e-10
        
         | srean wrote:
         | Indeed. I cannot upvote this enough New fanboys of DNN seem so
         | enamored by the universal approximation property and cite it at
         | the slightest provocation. There is no dearth of universal
         | approximators, that's not what makes DNN special. The special
         | thing is how does simple training procedures seem to find these
         | approximations that generalize well (or doesnt as shown by the
         | adversarial examples).
        
         | dahart wrote:
         | This whole example is a red herring. This isn't a splines
         | versus NNs issue at all, you're talking about the well known
         | fact that choice of basis affects the ability to fit, which has
         | nothing to do with whether you use a network. As a concrete
         | proof, since a spline is (usually) a polynomial function, it
         | can be defined as a linear network with as many layers as the
         | spline's polynomial order, in other words splines are a strict
         | subset of the functions you can build using neural networks.
         | You can also make a neural network out of spline neurons if you
         | want. And you can cherry pick lots of different functions that
         | work better for splines than other choices, and you can also
         | cherry pick functions that perform worse for splines than other
         | bases. Splines perform far worse on a periodic function of
         | arbitrary domain than a Fourier fit. Your example is contrived
         | because you artificially constrained the range to [0, 2pi].
        
           | credit_guy wrote:
           | I'm sorry, but I have to disagree with you here.
           | 
           | The "Universal Approximation Theorem" is not the point of
           | neural networks. People should stop mentioning it, or if they
           | do, they should state at the same time that there's nothing
           | special about NNs, that numerous classes of functions possess
           | the same property.
           | 
           | Here's my own pitch for neural networks: NN's suck. Big time.
           | They suck in low dimensions and they suck in high dimensions.
           | But the curse of dimensionality is so formidable, that
           | everything sucks in high dimension. Neural networks just
           | happen to suck less than all other known methods. And because
           | they suck a bit less, there are applications where they are
           | useful, and they have no substitute.
        
             | abeppu wrote:
             | > Neural networks just happen to suck less than all other
             | known methods.
             | 
             | Or, perhaps, the best demonstrated performance of NNs
             | exceeds the best demonstrated performance of other known
             | methods for many tasks. But ... the amount of compute,
             | investment in tooling, and attention that have been thrown
             | at deep learning in the past decade is at a scale where ...
             | do we actually know that other methods would perform worse
             | with the same resources? Is there some alternate timeline
             | where in 2012 someone figured out how to run MCMC for
             | bayesian non-parametrics over much larger datasets or
             | something, and the whole field of ML just tilted in a
             | different direction?
        
               | credit_guy wrote:
               | That's a very good observation. However, deep learning
               | didn't just get the share of the first mover. Before DL
               | was popular, Support Vector Machines used to be where all
               | the ML fun research was happening. And just out of
               | nowhere, Random Forests and XGBoost came and took the
               | crown if only for a fleeting moment. Gaussian Processes
               | always showed promise, but I'm not sure they delivered.
               | Deep Learning just delivered. I guess it's because of the
               | composability. But you are absolutely right that there's
               | no proof and now way of knowing right now if DL is the
               | best there possibly can be.
        
             | dahart wrote:
             | It doesn't matter if you disagree (BTW I don't know what
             | you disagree with specifically, and I did not mention the
             | Universal Approximation Theorem. It seems like you're
             | making some assumptions.) A polynomial spline is still a
             | subset of a neural network, so if you're right, all you're
             | demonstrating is that splines also suck at solving the same
             | problems that neural networks solve. The discrepancy
             | between the two here, again, has nothing to do with
             | networks and everything to do with your contrived example.
        
         | leoff wrote:
         | Since you are nitpicking, you could well use a sinusoid
         | activation function on the Neural Network, and reach an even
         | smaller loss value.
        
           | credit_guy wrote:
           | Not sure I understand your point. Do you want to use a bunch
           | of sine functions to approximate a sine function? What would
           | that show?
           | 
           | Splines don't know anything about the nature of a function.
           | They approximate any function with piecewise polynomials.
           | 
           | Maybe you are trying to say that the default activation
           | function (relu) in sklearn is not smooth. No problem, you can
           | add                 activation='tanh'
           | 
           | inside the definition of the NN, and check the RMSE. Turns
           | out it's for some reason worse.
        
             | marginalia_nu wrote:
             | I assume they're referring to Fourier expansion.
             | 
             | In general you can use a pretty wide set of functions to
             | approximate an arbitrary function. You can do it with
             | polynomials (Taylor expansion), and many others as long as
             | they form a Hilbert space.
             | 
             | Producing a given function from a linear combination of
             | other functions isn't groundbreaking in the least.
        
         | gowld wrote:
         | Universality shows potential, not optimality. the article
         | covers this.
        
         | Veedrac wrote:
         | > >> RMSE for NN approx: 0.00011776185865537907
         | 
         | Error is approximately 0.0001 because `tol`, the parameter that
         | tells optimization to finish, is 0.0001.
         | 
         | Set tol=0, and then beta_2=1-1e-15, epsilon=1e-30 to maximize
         | stability from the optimizer, and I got RMSE for the neural
         | network to go below 5e-7.
         | 
         | This is all very academic because stochastic gradient descent
         | is a horrific tool to be using for this purpose. You aren't
         | wrong about that.
        
           | credit_guy wrote:
           | Fair enough, I wrote the snippet in 5 min and didn't check
           | the tolerance parameter.
           | 
           | But with your improved choice of parameters, the NN is still
           | about 1000 times worse than the cubic spline, despite having
           | 50 times as many nodes.
        
             | Veedrac wrote:
             | I don't think these numbers are meaningful. It's not far
             | off from a degree-2 interpolation already, and I got a
             | hidden layer size of 20 to an error of 7e-5 by just letting
             | it optimise for longer and picking a seed that worked well,
             | which is basically the same error as a degree-1
             | interpolation that gets 5e-5.
             | 
             | Like sure the spline is doing better, but that's not why we
             | care, it's not like there's a general sense in which spline
             | interpolations are going to be better than the optimal fit
             | from a larger neural network, they're just a simpler,
             | faster, more numerically stable way of solving simpler
             | problems. An optimiser designed for 1D interpolation of
             | small neural networks, for all I know, might get extremely
             | accurate results.
        
       | grafs50 wrote:
       | With the top comments (now) all talking about the fact that this
       | universal approximation theorem doesn't really have much impact
       | in the real world. I wonder, is this interesting outside of
       | theory? Has this motivated any techniques that have created (or
       | may create) real-world, empirical results? Could it even?
        
       | amelius wrote:
       | I want to see it compute the Ackermann function.
        
         | musesum wrote:
         | > Ackermann function
         | 
         | Perhaps quantum neurons?
        
         | cperciva wrote:
         | I want to see it compute the Busy Beaver function.
        
         | [deleted]
        
       | anothernewdude wrote:
       | Did you know a big enough hash-table can compute any function?
        
       | [deleted]
        
       | The_rationalist wrote:
        
       | alephnan wrote:
       | s/function/computable function
        
       | bmitc wrote:
       | This is called out deep into the article, but shouldn't it be
       | "neural nets can approximate any function"?
       | 
       | Also, how does this relate to the traditional notation of
       | computability of functions?
        
       | t_mann wrote:
       | As someone who has taught this to CS students, just scrolling
       | through I have to say it looks like this has about 5x more text
       | it should have. This is a homework problem for second-year
       | students (literally, where I used to teach) that should take them
       | no more than a page to answer.
        
       | mjburgess wrote:
       | There are many caveats to this, esp. that this "fact" has nothing
       | to do with whether training a neural network on a dataset will be
       | useful.
       | 
       | There is often no function to find in solving a problem, ie.,
       | there is no mapping from ImageSpace -> DogCatSpace. Ie., most
       | things are genuine ambiguitites --- a stick in a water appears
       | bent, indistinguishably from an actually bent stick in some other
       | transparent fluid.
       | 
       | Animals solve the problem of the "ambiguity of inference" by
       | being in the world and being able to experiment. Ie., taking the
       | stick out of the water. A neural network, in this sense, cannot
       | "take the stick out the water" -- it cannot resolve ambiguities.
       | So that it can "approximate functions" is neither necessary nor
       | sufficient for a useful learning system.
       | 
       | More significantly, a NN is a very very bad approximator of many
       | functions. Consider approximating a trajectory _so that one can
       | then find an acceleration_ , ie., here we need f(x) to find
       | d2f/dx2 -- NN approximations are typically "OK" at the f(x) level
       | and really really crappy at the df/dx level, because the non-
       | linear functions NNs just glue together are only trainable if
       | theyre very rough.
       | 
       | For these, and lots of other reasons, this theoretical approach
       | to learning is largely marketing none-sense. If you go out and
       | actually study the only known systems which learn effectively
       | (ie., animals), one does not find the need for universal
       | approximator theorems in explaining their capacities.
       | 
       | These theorems _only_ show that NNs are, like many computational
       | statistical techniques, _sufficient_ for a wide class of  "mere
       | approximation" problems that are only narrowly useful within the
       | whole field of learning-as-such.
        
         | benrbray wrote:
         | What techniques are used to estimate trajectories? Would
         | something like a Gaussian Process perform better?
        
           | mjburgess wrote:
           | Well NNs like GPs are "basically" non-parametric methods, in
           | the sense that one does not start with a known parameterised
           | statistical distribution that comes from domain expertise.
           | These are worst-case techniques when we dont have the option
           | to start with "the right answer", eg., in the case of large
           | datasets where we have no idea how some pixels distribute
           | over cat/dog images.
           | 
           | In the case of a trajectory we would likey already know the
           | answer, in the form of just doing some physics. The role of
           | computational stats here then is to start with the known form
           | of the solution and find the specific parameters to fit it.
           | 
           | Since we have physics, we can find the "perfect answer" to
           | the trajectory question with very few data points -- and take
           | as many derivatives as we like.
           | 
           | Brute-force ML is often used when we dont have theories,
           | making it all the more dangerous; and alas, all the more
           | useful. We can get a 5% improvement on click-thru rate
           | without having any theory of human behavioural psychology ---
           | god knows then, what we are doing to human behaviour when we
           | implement this system.
           | 
           | `
        
           | aqme28 wrote:
           | If you're asking how to solve difficult differential
           | equations in general, we use numerical methods like finite
           | elements or finite differences
        
             | adgjlsfhk1 wrote:
             | that said, there has been some promising research on using
             | NNs for solving nonlinear pdes.
        
         | emmelaich wrote:
         | Can you just show the stick moving to the NN?
         | 
         | Are you just denying information to the NN that is available to
         | the animal?
        
         | sdenton4 wrote:
         | You've got a kind of narrow view of the matter... The need for
         | interactive understanding is addressed in different ways by
         | reinforcement learning, GANs, and autoregressive recurrent
         | neutral networks.
         | 
         | In the latter cases, the generative output of the network is a
         | kind of experiment, and back prop from the loss function
         | provides a route to improvement.
         | 
         | I think it's an unfortunate historical accident that the field
         | of machine learning is so transfixed with classifiers. But
         | they're really not the only game in town.
        
         | fxtentacle wrote:
         | The article also conveniently glosses over the fact that all AI
         | calculations are limited in their maximum complexity by the
         | depth of the AI. For example, for x! the best a regular DL AI
         | can do is to memorize some values and interpolate between them.
        
         | curiousgal wrote:
         | > _There is often no function to find in solving a problem_
         | 
         | In finance NN can be used to calibrate models, i.e. generate
         | parameters to a model (function) that replicates existing data
         | (obsered prices/volatilities).
        
           | mjburgess wrote:
           | Well i dont think prices are functions of economic variables.
           | 
           | Recall a function is `y = f(x)`, not `y1, y2, y3... = f(x)`.
           | 
           | So what you're modelling is something like `E_hope[y] = f(x)`
           | where `E_hope` is "a hopeful expectation" that the mean of
           | the underlying ambiguous y1,..yn does "reliably" mean to a
           | unique `x`.
           | 
           | This "hopeful expectation" is certainly more common than
           | there being any actual function connecting `y` to `x`, but i
           | think its often quite false too. Ie., even the expectation of
           | prices is genuinely ambiguous.
           | 
           | To handle this we might ensemble models
           | `E_ensemble[E1_hope[y], ...En_hope[y]]`, but to repeat a
           | famous idiom in finance, this is very much "building
           | sandcastles in the sky".
           | 
           | The idea that you can just "expect" (/statstics) your way out
           | of the need for experimentation is a dangerous superstition
           | which is at the heart of ML. It is impossible to simply
           | "model data", measurement produces genuine ambiguities which
           | can only be resolved by changing the world and seeing-what-
           | happens. There is no function to find.
        
             | thfuran wrote:
             | >Recall a function is `y = f(x)`, not `y1, y2, y3... =
             | f(x)`.
             | 
             | y1,...,yn is a perfectly reasonable function output.
             | Functions don't have to produce scalars.
        
               | mjburgess wrote:
               | Functions have to resolve to one point in the output
               | domain, even if that point is multi-dim.
               | 
               | Here, consider `y_houseprice = price(house data, economic
               | data, etc.)`. There isnt a unique house price in terms of
               | those variables. The real world observes many such prices
               | for the same value of those variables.
               | 
               | An overly mathematical view of the world has obscured the
               | scientific method from our thinking here. Generally,
               | there arent actually functions from X to Y, and there
               | arent actually stable XY distributions over time.
               | 
               | The world, as measured, is basically always ambigouous
               | and discontinuous. Data, as measurement, isnt the
               | foundation of our theory-building. We build theories by
               | changing the world; data comes in as a guiding light to
               | our theory building, not as the basis.
               | 
               | ..wwhich is our direct causal interactions with our
               | environment, ie., its the actual stuff of our bodies and
               | the stuff of the world _as we change it_
        
               | pedrosorio wrote:
               | > There isnt a unique house price in terms of those
               | variables
               | 
               | > An overly mathematical view of the world has obscured
               | the scientific method from our thinking here
               | 
               | Since it is understood that houseprice is not a function
               | of just three variables, the mathematical view
               | (statistical learning theory) commonly used when training
               | models, defines house price as a random variable. This
               | takes into account the uncertainty from all the unknown
               | factors they contribute to house prices.
               | 
               | The distribution defining this random variable is a
               | function of the 3 input observations. Commonly, the
               | inputs are used to compute the mean, and the shape of the
               | distribution is fixed - a Gaussian, for example - but not
               | necessarily.
               | 
               | Given observations of the 3 inputs, each observed
               | y_houseprice is just a sample from this random variable.
        
               | mjburgess wrote:
               | Well a random variable _is_ a function, from event space
               | to the real line. We return to a single measure by taking
               | its expectation. We dont model `Y = f(X)`.
               | 
               | This doesn't play well with the universal theorem in the
               | article. NNs can only be said to model expectations of
               | random variables.
        
               | thfuran wrote:
               | >We return to a single measure by taking its expectation
               | 
               | Only if you want to throw away most of the information in
               | your model.
        
               | [deleted]
        
               | mjburgess wrote:
               | Well, indeed.
               | 
               | One then needs to explain how a NN being a "universal fn
               | approximator" helps at all in this context.
               | 
               | One models RVs generativey using distributions (and so
               | on), the actual model (eg., of house prices) isnt a
               | function, it's often an infinity of them.
        
               | thfuran wrote:
               | >One then needs to explain how a NN being a "universal fn
               | approximator" helps at all in this context.
               | 
               | Given that I can't tell why you don't think it does, I
               | don't think I can explain it to you. From the other
               | contexts you've talked about here, you seem to be
               | implying that the only thing that is potentially useful
               | is an AGI which either carries out or merely designs
               | experiments. But that's patently absurd.
        
               | mjburgess wrote:
               | I'm happy to hear the very narrow case on this. Can a NN
               | learn geometric Brownian motion?
        
         | Der_Einzige wrote:
         | Interpolation in GANs seems a lot like "being able to
         | experiment. Ie., taking the stick out of the water"...
        
           | tsimionescu wrote:
           | They are not taking the stick out of the water, because they
           | don't have hands and are not looking at a real stick. They
           | are being trained on a static data set, and a picture of a
           | stick isn't a stick. They can try to extrapolate all they
           | want, but they are fundamentally not going to be able to get
           | more information out of the data than there exists.
           | 
           | And in a static photo of a bent stick, or a fluffy critter,
           | there simply isn't any information to tell whether this is a
           | bent stick or a stick in water; or whether it's a cat or a
           | dog. The intelligent response is not "I don't know", and it's
           | not "60% it's a cat, 40% it's a dog". It's "here is the set
           | of actions that need to be taken to create more data to be
           | able to settle the question".
           | 
           | And creating that set of actions is completely different from
           | current approaches. No state of the art GAN can say "you need
           | to view the subject from a steeper angle, check for a
           | reflection to see if it's in water" or "poke it with a stick,
           | see if it meows or barks", because they don't have enough
           | information about the world in their train g sets to be able
           | to even know that these are possibilities.
        
         | tiborsaas wrote:
         | Did you really compare a single neural network with a real
         | physical agent (dog) with 5 senses, trillions of cells, a
         | complex inter-connected brain operating in the real world? This
         | is close to what I'd call unfair, or straw-man argument.
         | 
         | This is like calling calculus useless, because it doesn't help
         | you pick one kind of milk versus another in a supermarket.
         | 
         | Universal function approximation is a great tool if used
         | correctly. From what I understand, it comes in handy when our
         | puny human brain can't come up with an algorithm to produce
         | such function. Let's stick to image recognition/processing. Can
         | you write a function that recogizes a cat or a dog on a 64x64
         | greyscale image? Or write a function to remove the background
         | from a picture of a person. Can you write a function to
         | generate a 3D depth map for a 2D picture?
         | 
         | > These theorems only show that NNs are, like many
         | computational statistical techniques, sufficient for a wide
         | class of "mere approximation" problems that are only narrowly
         | useful within the whole field of learning-as-such.
         | 
         | That's exactly what people are looking for in most cases, the
         | mere approximation can be good enough to consider the problem
         | solved.
        
           | [deleted]
        
           | mjburgess wrote:
           | I've spoken with no-end of people who think "universal fn
           | approximation" is some magic token to be played in the AI
           | debate -- people getting PhDs in ML no less.
           | 
           | These are people really, in my experience, with no science
           | background -- c-sci programmers who dont have any conceptual
           | foundations in applied mathematics or science (outside of the
           | discrete math taught in csci) -- and who take these
           | properties as genuinely quite magical.
           | 
           | "Intelligence" is, to them, just some function and if NNs can
           | approximate any fn, then presumably they'll be intelligent.
           | 
           | They arent aware that in a sense, everything is a dynamical
           | function of space and time (say, stuff(x, t)) and to
           | instantite it, one requires the entire universe.
           | 
           | In otherwords, csci people are not used to thinking about
           | applied mathematics in the sense of science (, implementation
           | of functions, dynamical functions, and so on). I think it is
           | important for this audience to demystify these properties.
           | 
           | Being a universal fn approximator, in my view, is neither a
           | necessary nor sufficient property of any system
           | _implementing_ intelligence. It 's really a misdirection.
        
             | tiborsaas wrote:
             | I'm one of these persons with no formal training in any
             | scientific fields. I don't think any of what you assume
             | people like me think in your stereotype. I know it's not
             | magic, it's brute force problem solving. Once you are
             | capable of training systems like this, you get to solutions
             | which are really powerful.
             | 
             | All you see is numbers and functions, and all I see is
             | problems being solved by these AI/ML methods and often in
             | quantifiably better ways than intelligent humans can.
             | 
             | NN-s are the building blocks. Neurons in your brain are
             | dumb as well, it's the quality of the network that matters.
             | So instead of moving goalposts, what do you think is
             | required to implement intelligence?
        
               | mjburgess wrote:
               | Well, I'm not making a sterotype -- I'm offering an
               | explanation of the people i've met. It's very hard to
               | converse with people whose academic background is
               | discrete mathematics, in an essentially empirical domain
               | (that of modelling intelligence).
               | 
               | There's a lot which is odd (and dubious) in AI/ML that
               | traces is origins to this peculilar situation of a
               | discipline (csci) in the "early phases" of modelling an
               | empirical phenomenon, without yet, an intra-disciplinary
               | theoretical support for it -- csci people take geometry
               | and classical physics to make video games. They dont yet
               | take the equivalent to make intelligent systems (which
               | would include stuff on learning in animals, humans; and
               | in my view, more applied math).
               | 
               | In any case, to answer your question about
               | implementation, see https://news.ycombinator.com/threads?
               | id=mjburgess#30579711
        
               | netizen-936824 wrote:
               | There is far more computational power in a single neuron
               | than any NN that I've heard of.
               | 
               | One of the single approximated "neurons" in a AI/ML NN
               | doesn't even operate anywhere close to a real neuron.
               | Real neurons are oscillators which exhibit nonlinear
               | dynamic behaviors
        
               | beaconstudios wrote:
               | NNs are clever, but what they do is essentially reverse
               | programming. Rather than you write a function that
               | translates a -> b, you give it a -> b mappings and the
               | training writes the function. What GP was saying is
               | something that most programmers should already know: that
               | writing the function is usually the easy bit, and the
               | hard bit is defining and then modelling the problem; NNs
               | can't do that.
        
               | AlotOfReading wrote:
               | Just because you can brute force things doesn't mean it's
               | a practical way to solve certain problems. The set of all
               | C++ programs humans will ever write is finite and
               | therefore parseable with a regular language, but no one's
               | out there writing C++ compilers that way for obvious
               | reasons. That's the essence of what I think GP is getting
               | at.
               | 
               | I don't know if NNs are sufficiently powerful escape that
               | argument because my formal understanding of "the world"
               | simply isn't good enough, but it's not obvious to me that
               | they are.
        
               | tiborsaas wrote:
               | I've seen a case when someone implemented an algorithm
               | after an existing AI was created. I can't remember
               | unfortunately, but the hand made version was better so
               | it's certainly a possibility. NN-s automate human problem
               | solving as it was demonstrated recently by DeepMind's
               | AlphaCode.
               | 
               | It can be quite practical though as we failed to come up
               | with solutions that were previously out of reach before
               | deep neural networks (and massive increases in hardware
               | capabilities).
        
         | version_five wrote:
         | > There is often no function to find in solving a problem, ie.,
         | there is no mapping from ImageSpace -> DogCatSpace. Ie., most
         | things are genuine ambiguitites --- a stick in a water appears
         | bent, indistinguishably from an actually bent stick in some
         | other transparent fluid.
         | 
         | That's tangential imo. There is a function that maps from image
         | space to dog/cat/don't know space. A sentient being that get
         | the "don't know" can get more info to resolve the ambiguity (or
         | rephrase the question). A universal function approximator can
         | still make itself useful even if all it can do it say it
         | doesn't know. This is a question of problem setup.
         | 
         | NNs are bad at extrapolating, including e.g. trivially to
         | periodic functions. This is a limitation if you thought they
         | could do that, but again a question of understanding what they
         | do.
         | 
         | Hype, as you say, leads some people to believe NNs are magic,
         | leading to mismatched expectation. A universal interpolate-only
         | function approximator is still pretty useful though, just maybe
         | disappointing if you understood that to imply sentience
        
           | sdenton4 wrote:
           | "NNs are bad at extrapolating, including e.g. trivially to
           | periodic functions."
           | 
           | WaveNet and its descendents are the obvious counter example
           | here. It is excellent at learning to generate periodic and
           | nearly periodic functions...
        
             | version_five wrote:
             | I know I've seen papers where they use sinusoidal
             | nonlinearities to learn periodic functions. I felt like
             | that's a bit of a hack though (not necessarily a bad
             | thing)- you're bringing domain knowledge in, which if you
             | allow makes it easy to extend to periodic functions. The
             | failure of a vanilla nn to learn periodicity is I think a
             | specific failure to extrapolate, which is the bigger
             | problem.
             | 
             | I'll take a look at the architecture you mention.
             | 
             | Edit: looked it up, I see it's an auto regressive model,
             | like pixel CNN in 1D. I do know pixelCNN, I hadn't really
             | considered the connection to periodic functions but I see
             | what you're saying. In a sense any AR model is
             | extrapolating, but not in the sense I mean: it has seen
             | training examples of the next point predicted from the last
             | points, it's not extending a relationship it learned in
             | training to something new. Anyway, thanks for pointing the
             | model out
        
             | rileyphone wrote:
             | I think the approach of [0] is closer to what's going on in
             | our brains, basically evolving symbolic equations of
             | partial derivatives until we get something good enough.
             | Really fascinating and succinct paper.
             | 
             | 0. https://cdanfort.w3.uvm.edu/courses/237/schmidt-
             | lipson-2009....
        
           | bannedbybros wrote:
        
           | mjburgess wrote:
           | Sure, we could see `animal = f(image)` as a partial function
           | and lift it into some total function space, `maybe_animal =
           | f(image)`.
           | 
           | Is our reasoning here actually this total function though? We
           | really do always need to keep in mind that animal reasoning
           | will terminate, animals will act "guided by other reasoning",
           | and resume the original reasoning process.
           | 
           | Action really messes up this neat "totalising option" for
           | partial functions. If i'm not sure if "fluffy" is a dog or a
           | cat, i might wait longer for it to move; I might throw
           | something at it; I might ask its owner.
           | 
           | This isnt as simple as "don't know", since reasoning is kinda
           | time-bound and time-parameterised, my very measuring process
           | is sensitive to my own confidence in what-something-is.
           | 
           | Is this _really_ "f(image) = dont-know"? I dont think so.
           | 
           | I think it's more like, "judgement = body-brain-state(t,
           | reasoning-goal, {action policies}, ...large-number-of-other-
           | things)".
           | 
           | This gets to my issue. In my view what animals are doing is
           | better described by a dynamical equation of state (like a
           | wavefunction), as the whole system is operating under its own
           | dynamical evolution (including, eg., what fluffy does in
           | response to your puzzlement).
           | 
           | I dont see Dog|Cat|DontKnow as the answer here. I dont think
           | it's the actual total function which corresponds to our
           | judgement, though it probably is total -- we just end up with
           | "DontKnow" in all the cases actual intellkigence is
           | requireds.... the very think we're aiming to model.
        
         | shawnz wrote:
         | > If you go out and actually study the only known systems which
         | learn effectively (ie., animals), one does not find the need
         | for universal approximator theorems in explaining their
         | capacities.
         | 
         | What's the explanation for their capacities, then?
        
           | mjburgess wrote:
           | direct, theory-laden, causal contact and pro-active
           | engagement with their environments... eg., taking the stick
           | out of the water.
           | 
           | "data" (even from an animal's pov) is basically, by nature,
           | ambiguous. Measurement is an event in the world (eg., light
           | hitting the eye) which isn't somehow unambiguously
           | informative. To over come this problem, basically, animals
           | move stuff.
           | 
           | The adaptable motor cortex of the most intelligent animals,
           | therefore, isnt something to be tacked-on to "intelligence",
           | it's the precondition for it.
           | 
           | Glibly: tools before thoughts. Reason needs content to
           | operate on, the content of our thoughts is _built_ via our
           | (sensory-)motor systems.
           | 
           | The idea that we need ever more theoretically powerful models
           | of reasoning here is a misdirection -- it misses that the
           | heart of everything we know arrives via some form of repeated
           | experiment.
           | 
           | Every more automated "pure reasoning" either via statistics
           | or symbolically, is always just a means of juicing the data
           | we provide the system. Useful as a technology, but not as a
           | genuine learning system. It will never have the means of
           | resolving the many ambiguities within data itself.
           | 
           | In the case, for example, of NLP -- the structure of 1
           | trillion documents will never enable a machine to answer the
           | question "what do you like about what i'm wearing?" --
           | because (1) the machine isnt here with me; (2) i am asking
           | for its personal judgement, not a summary of a trillion
           | documents; and (3) that summary of those trillion documents
           | has to be unique, but the question has no "right answer".
           | 
           | Whilst computer science has the driving seat over what
           | "intelligence" is, we will forever be stuck with this
           | incredibly diminished view of our own capacities and the size
           | of the technological challenge. The goal isnt to sift through
           | everything we have already done and "take a mean", the goal
           | is to produce a system which could have done "everything we
           | have done" without us.
        
             | gowld wrote:
             | Experiments are data collection. It's the iinput to the
             | thinking process.
        
             | bbqbbqbbq wrote:
        
             | shawnz wrote:
             | That clearly doesn't explain how you get from the
             | biological system to intelligence though. NNs can be
             | imagined as a (highly simplistic) form of repeated
             | experiment too.
        
               | mjburgess wrote:
               | Intelligence starts when the internal physical structure
               | of a system is "dynamically reflective" of its external
               | environment in a way which is stable over time
               | (basically, "complex hysteresis"). You get this with a
               | hard drive (+CPU, etc.) sure. I'd call this sort of
               | minimal intelligence merely "reactive".
               | 
               | You get, let's say "adaptive intelligence" when the
               | physiological structure being changed adapts the system
               | so that it is able to interact with its environment more
               | (eg., it can move differently).
               | 
               | To get to more advanced forms we need an explicit
               | reasoning process which can represent this physical state
               | to the system itself to engage in inference. Let's say
               | "cognitive intelligence".
               | 
               | We get typical mammalian intelligence when the broader
               | physiological structure (in particular the sensory-motor
               | structure) of the system is actually guided by this
               | explicit reasoning process. (Eg., a cyclist grows their
               | muscles differently by reasoning-when-cycling). Let's
               | call this "skill intelligence".
               | 
               | You get human intelligence when the explicit reasoning
               | process becomes communicable, ie., when interior
               | representations can be shared without the physical
               | activity of acquiring those representations. Human
               | intelligence is really "outside-in" in a very important
               | way, which AI today also neglects -- it took 100bn dead
               | apes to write a book, not one. Let's call this "socio-
               | symbolic intelligence".
               | 
               | What we have today is really just systems of "reactive
               | intelligence" with weird frankesianian organs attached to
               | them "look at alex turn the lights off!!!!". Alexa isnt
               | turning the lights off in the manner of the "socio-
               | symbolc intelligence" we attribute to alexa naively (and
               | delusionally!).
               | 
               | Alexa is a reactive system which has something of socio-
               | symbolic significance ( _to us_ , not it!) glued on.
               | Alexa does not intend to turn the lights off, and we're
               | not communicating our intent to her. She's a harddrive
               | with a SATA cable to a lightswitch.
        
               | shawnz wrote:
               | It seems to me like a "reactive intelligence" is
               | basically equivalent to an "adaptive intelligence" which
               | just hasn't begun making use of all of its possible
               | outputs yet. Obviously even though we adapt to our
               | environment, there are still ultimate limits to how far
               | we can adapt.
        
               | mjburgess wrote:
               | The reason a dolphin isnt as smart as an ape, is that
               | it's a tube of meat in the ocean; and an ape is a pianist
               | up a tree.
               | 
               | I see non-trivial forms of intelligence as largely as
               | symptoms of physiology. Even the human brain in a dolphin
               | would be dumb, indeed that's basically just what a
               | dolphin is.
               | 
               | There is something absolutely remarkable in a thought
               | _about_ something, moving my hands to type; and my hands
               | actually typing. Personally, I think 90% of that miracle
               | is organic -- it is in the ability of our ceullar
               | microstructure to adapt quickly; and our macro-structure
               | to adapt over time.
               | 
               | Either way, people who intend to build intelligent
               | systems have a task ahead of them. Building a system
               | which can really use tools _it hasnt yet invented_ is, in
               | my view, a problem of materials science more than it is
               | of discrete mathematics.
        
               | yazanobeidi wrote:
               | I like what you wrote here and how you think.
               | 
               | However I have to make one remark.
               | 
               | Schopenhauer would say that Alexa does in fact have will
               | to turn the lights off. A burning will, the same will
               | within yourself and everything that is not idea. That it
               | is your word that sets off an irreversible causal
               | sequence of events leading to the turning off of the
               | lights. Schopenhauer would ascribe his "Principle of
               | Sufficient Reason" as the reason for happening. It is not
               | that Alexa chooses to obey, but by the causal chain
               | enforced by physics and more leaves the will of the
               | universe no choice but to turn off the lights. Same
               | reason why the ball eventually falls down when thrown up.
               | I believe this is the metaphor Schopenhauer uses in his
               | World as Will and Idea.
        
               | abeppu wrote:
               | I feel like bringing David Marr's "levels of analysis"
               | into the discussion is useful here, at least in very
               | loose terms.
               | 
               | > explicit reasoning process which can represent this
               | physical state to the system itself to engage in
               | inference
               | 
               | > when interior representations can be shared without the
               | physical activity of acquiring those representations
               | 
               | Roughly, you're talking at the 'computational level' or
               | perhaps above. You're describing qualities of the
               | computation that an agent must be doing. You don't
               | descend to the algorithmic level, which is where the
               | 'universality theorem' discussion is taking place. Which
               | is not to say that any of what you've said is wrong, but
               | to someone like the parent asking "_how_ you get from the
               | biological system to intelligence though" (emphasis
               | mine), I think it's basically a non-answer.
        
               | mjburgess wrote:
               | Well, my answer there will alarm many. Broadly, it's
               | whatever algorithm(s) you'd call biochemistry. I think
               | the self-replicating adaptive properties of organic
               | stuff, is the heart of how we get beyond what the mere
               | hysteresis of hard-drives can do. We require cells, and
               | their organic depth.
               | 
               | We dont often describe reality with algorithms in the
               | sciences, if reality admits a general algorithmic
               | description, it is surely beyond any measurable level. So
               | I dont think the answer to the problem of intelligence
               | will require computer science till much later in the
               | game; if it is even possible to actually artificially
               | create it.
               | 
               | Whatever "algorithm" would comprehensively describe the
               | relevant properties of cells, even if it could be written
               | down, will never be implemented "from spec". One may as
               | well provide the algorithm for "a neutron star" and
               | expect playing around with sand will make one.
        
               | abeppu wrote:
               | I think this is another non-answer.
               | 
               | Biochemistry is broad and does really diverse things, and
               | if your answer to "how does a mammalian brain allow it to
               | reason through interactions with its physical
               | environment" is "biochemistry", and that's also
               | presumably the answer to "why is that pond scum green?"
               | and "why are fingernails hard?", then it fails to be an
               | explanation of any sort.
               | 
               | Similarly, if someone asks "why is your program dying on
               | a division-by-zero error", it's not an explanation to say
               | "well, that's just a matter of how the program executes
               | when provided those particular inputs".
               | 
               | What _specifically_ about the biochemistry of our nervous
               | systems allows us to solve problems, or communicate, as
               | versus just metabolize sugar?
        
               | [deleted]
        
               | mjburgess wrote:
               | It's about self-replication, adaption and "scale-free
               | properties".
               | 
               | Consider that touch on the surface of my skin, which can
               | be a few atoms of some object brushing a single cell --
               | that somehow "recuses up" the organic structure of my
               | body (cell, tissue, organ, ...) both adapting it and
               | coming to be "symbolically objectified" in my reasoning
               | as "a scratch".
               | 
               | The relevant properties here are those that enable
               | similar kinds of adaption (and physiological response)
               | _at all relevant scales_ from the cell to the organ to
               | the whole-body.
               | 
               | I think cells-oragans-bodys are implementations of
               | "scale-free adaption algorithms" (if you want to put it
               | in those terms), which enable implementation of "higher-
               | order intelligence algorithms".
               | 
               | If you want much much more than this, then even if I had
               | the answer, it wouldnt be comment-sized, i'd be a
               | textbook. But of course, no one has that textbook, or
               | else we wouldnt be talking about this.
               | 
               | I think if you see cells as extremely self-reorganizing
               | systems, and bodies as "recursive scale-free"
               | compositions of "self-reorganizing adaptive systems",
               | then you get somewhere towards the kinds of properties
               | i'm talking about.
               | 
               | I think my ability to type _because_ i can think, is a
               | matter of that  "organic recursion" from the sub-cellular
               | to the whole-body.
        
         | slibhb wrote:
         | > Animals solve the problem of the "ambiguity of inference" by
         | being in the world and being able to experiment. Ie., taking
         | the stick out of the water. A neural network, in this sense,
         | cannot "take the stick out the water" -- it cannot resolve
         | ambiguities.
         | 
         | Couldn't training data include a video of someone taking a
         | stick out of the water?
        
           | ddingus wrote:
           | Then we have no ambiguity. The data is inclusive.
           | 
           | And the cost is a very seriously larger problem space to
           | classify.
        
       | smolder wrote:
       | Yeah, all you need to do is show you can build a NAND gate with a
       | neural network and every other logical network follows from that.
       | I remember doing that project in a machine learning class in
       | 2010.
        
       | kazinator wrote:
       | Really? Can a neural compute the function halts(N, I): the neural
       | net N will halt on input I, for any neural net N and input I?
        
         | advisedwang wrote:
         | The article says:
         | 
         | > The second caveat is that the class of functions which can be
         | approximated in the way described are the continuous functions.
         | If a function is discontinuous, i.e., makes sudden, sharp
         | jumps, then it won't in general be possible to approximate
         | using a neural net. This is not surprising, since our neural
         | networks compute continuous functions of their input. However,
         | even if the function we'd really like to compute is
         | discontinuous, it's often the case that a continuous
         | approximation is good enough. If that's so, then we can use a
         | neural network. In practice, this is not usually an important
         | limitation.
        
       | kolbe wrote:
       | So can most linear combinations of functions set to the nth
       | power.
        
       | __MatrixMan__ wrote:
       | I think the author might be overlooking just how weird something
       | can be while still being a function.
       | 
       | Let f(x):R->{1,0} be such that f(x)=1 if x is rational and 0
       | otherwise.
        
         | Banana699 wrote:
         | "Visual proof" pretty much gave away the fact that it was going
         | to be a non-rigorous reasoning process based on drawing an
         | arbitary plot, once you draw a plot you're already assuming a
         | lot about its underlying function. The title isn't very
         | misleading.
        
         | vecter wrote:
         | He says continuous function
        
           | jjgreen wrote:
           | He also says in the first sentence "One of the most striking
           | facts about neural networks is that they can compute any
           | function at all", that later caveat is incompatible, most
           | function are not continuous.
        
           | laichzeit0 wrote:
           | Ok so what about the Cantor function? Can it learn that?
        
       | aaaaaaaaaaab wrote:
       | That's a pretty useless "proof" which gives no insight into how
       | neural networks learn in practice. The author takes a sigmoid
       | neuron, tweaks its parameters to the extreme so that it looks
       | like a step function, then concludes that since a linear
       | combination of step functions can approximate anything, so do
       | neural networks. Bravo.
        
         | r-zip wrote:
         | Did you ever consider that explaining how deep nets "learn in
         | practice" is not the point?
        
           | anothernewdude wrote:
           | If it's not about learning this result is useless. A lookup
           | table can compute any function, but who cares?
        
             | mdp2021 wrote:
             | > _who cares_
             | 
             | The people who need to build some (sophisticated kind of)
             | lookup table for a function of unavailable details through
             | automation.
        
       | voldacar wrote:
       | This "proof" is pretty sketchy. I think he means every
       | _differentiable_ function? Because I fail to see how you can make
       | a neural network evaluate the indicator function of the rationals
        
         | bjourne wrote:
         | "The second caveat is that the class of functions which can be
         | approximated in the way described are the continuous functions.
         | If a function is discontinuous, i.e., makes sudden, sharp
         | jumps, then it won't in general be possible to approximate
         | using a neural net. This is not surprising, since our neural
         | networks compute continuous functions of their input. However,
         | even if the function we'd really like to compute is
         | discontinuous, it's often the case that a continuous
         | approximation is good enough. If that's so, then we can use a
         | neural network. In practice, this is not usually an important
         | limitation."
        
           | voldacar wrote:
           | I guess I didn't see that. But he's still using _continuous_
           | where the correct term would be _differentiable_
        
             | kevinventullo wrote:
             | You can uniformly approximate any continuous function on a
             | compact domain with a differentiable/smooth function.
        
             | bjourne wrote:
             | Continuous functions aren't necessarily differentiable.
        
               | voldacar wrote:
               | yes, that's my point
        
       | W0lf wrote:
       | Given that any stock is a function over time as well, there
       | should theoretically exist a neural net that can approximate the
       | stock price for the future? This reasoning is obviously wrong,
       | what is my exact error of thought though?
        
         | arghwhat wrote:
         | The error is that stock price is not a function over time, but
         | instantaneous demand which we record over time. That demand is
         | a function over an undefined number of variables.
        
           | ddingus wrote:
           | We can find a function to express past stock prices.
           | 
           | There isn't one for the future, unless said future is somehow
           | predetermined?
           | 
           | Is it, given enough input data?
           | 
           | Does this discussion then distill down to philosophy?
           | 
           | Do living beings have agency, or are they simply very complex
           | NNs?
           | 
           | How one answers that speaks to consciousness as much as it
           | does the prospect of a predictive stock price model.
        
         | visarga wrote:
         | Your input features are incomplete and mixed with noise.
         | 
         | The value of a dice is also a function over time. Can we learn
         | this function with a neural net? No, because our features don't
         | include the nitty gritty details of each throw so it's
         | essentially random.
        
         | paskozdilar wrote:
         | > what is my exact error of thought though?
         | 
         | There isn't one, you're just overestimating the value of
         | existence-propositions.
         | 
         | In practice, knowing that something exists is not a very useful
         | result - it is often more useful to know that something does
         | NOT exist (e.g. solution to the halting problem).
        
         | amalcon wrote:
         | Your mistake is that you left off "given the right inputs".
         | There are a lot of inputs to stock prices that are unlikely to
         | be readily available to your function.
        
         | max__d wrote:
         | With the same type of reasoning, we could plot whatever output
         | our brain gives and there will be some type of neural network
         | that can predict what we'll think/see/feel in the future. What
         | you said and the thing i just said were both ideas i had when
         | starting learning how ai works, sadly it's something we can
         | still not reach.at least for now
        
           | tluyben2 wrote:
           | If I understand you correctly, we are very far away from that
           | example as that is AGI and then some. You will not see that
           | in your lifetime so 'we can still' seems an interesting
           | (overly optimistic?) take on it.
        
             | laichzeit0 wrote:
             | AGI, since it lacks a technical/mathematical definition,
             | can be anything. It's mere philosophy at this point,
             | actually even vaguer than most philosophical problems.
        
               | posterboy wrote:
               | Although I share your sentiment in general, I would
               | presume that @tluyben's take is fairly true to the
               | broader philosophical view. The ritique of this view
               | being at least as weak as the views on intelligence per
               | se is a drop in the ocean really.
               | 
               | Implying, there is a wealth of thought devoted to
               | inteligence! That fact is actually proving the conjecture
               | in a nicely constructive way by itself, that we are
               | thoughtful indeed, if only you believe this axiomatically
               | like. The quintessential theorem was distiled by
               | Descartes, of course, wherefore he is remembered.
        
               | tluyben2 wrote:
               | I meant it to mean, indeed in a vague way, what we call
               | human intelligence or beyond; the parent says to make a
               | neural network that can predict what someone will
               | think/feel in the future, which seems the same or at
               | least indistinguishable from the subject's human
               | intelligence as it will result in the same outputs. So to
               | create the network implied by the parent, we would have
               | to a) be able to make networks of that (unknown)
               | complexity and b) 'copy' , or rather make it learn from
               | the outputs, the current 'state' of the subject's brain
               | in it. That is incredibly far removed from anything
               | cutting edge we know how to do. If it is at all possible.
               | 
               | So I was just surprised by their use of language as it
               | seems to imply parent thought we would be closer to or
               | there already with our developments of AI tech.
        
         | alexchamberlain wrote:
         | Stock prices are not well modelled as continuous functions- the
         | prices you see are generally trades (discrete function) and the
         | price may or may not have been available to you at the volume
         | you wanted (there are more variables than time).
        
         | nnq wrote:
         | besides karelp's sister comment, there's also the "obvious"
         | fact that _stock price is not a function of time, it 's not
         | P(t), it's a function of time and the entire f universe that
         | also evolves through time, more like P(t, U(t, ....))_ ....you
         | can simplify things by _assuming the laws of physics are
         | deterministic and you only need one instance of the state of
         | the universe, U, so you 'd have P(t, U)_
         | 
         | ...now if you don't explicitly represent U as a parameter,
         | you'll have it implicit in the function. So your "neural
         | network" contains _the entire state of the freakin universe
         | (!!)_.
         | 
         | Ergo, contingent on your stance on theologic immanence vs.
         | transcendence, what you'd call "neural network approximation of
         | the stock's price function" is probably quite close to what
         | other call... God (!).
         | 
         | (Now, if relativity as we know it is right, you might get aways
         | with a "smaller slice of U" - lear about "light cone". And to
         | phrase this in karelp's explanation context: you'd need to know
         | U to know which of the practically infinitely many such neural
         | networks to pick. The core of (artificial) intelligence is not
         | neural networks in themselves, it's _learning_ , the NN is a
         | quite boring computational structure, but you can implement
         | tractable learning strategies for it, both in code, and in
         | living cells as evolution has shown...)
        
           | DennisP wrote:
           | And you'd have to know the state of U to infinite precision.
           | Which makes me wonder whether neural nets have any hope with
           | a simple chaotic function. Maybe they do but just in the
           | short term, like predicting the weather.
        
         | ComradePhil wrote:
         | Theoritically, there exists a model that predicts all future
         | stock prices EXACTLY at any given time in the future, as long
         | as the results are completely isolated from all market
         | participants (i.e. the knowledge of the result is COMPLETELY
         | isolated from the market). Here is how you can theoritically
         | prove it exists:
         | 
         | Train a model today so that it is overfitting for a given
         | stock. It would predict everything very accurately upto today.
         | The ONLY way to make sure that the results are completely
         | isolated from the market is to not make the result available to
         | ANYONE (how do you know that an isolated human observer is not
         | leaking data with some unknown phenomena... say quantum
         | entanglement with particles in other people's brains, for
         | example). So, the ONLY way to test the models is back-testing.
         | 
         | You can extend that to saying that for any given point in the
         | future (say, this is a reference point), there will be an
         | overtrained model which will backtest perfectly i.e. the
         | theoritical model that works at any time during the past to
         | predict the exact stock price in the future upto the point of
         | reference.
        
         | fkfkno wrote:
         | p was a great movie!
         | 
         | https://www.youtube.com/watch?v=ShdmErv5jvs
        
         | posterboy wrote:
         | Contrary to the recent top comment on this, which fails to show
         | that such a net existing could be no coincidence, I guess, the
         | answer to your problem might be deeply physical and information
         | theoretic, as soon as you speak of _time_. Simply speaking, any
         | model is _good enough_ if the approximation is tolerably
         | accurate. In that sense, crude nets as well as expert systems
         | that trigger off clear signals and ample evidence may already
         | exist.
         | 
         | In particular, the way the stockmarkets are distributed the
         | function of time is likely relativistic and every participant
         | is acting under incomplete information even in the infinite
         | limit.
         | 
         | Also, you have to be cautious what any _function_ in this
         | context really means, as I imagine it means differentiable
         | functions (after somebody mentioned the Ackermann function,
         | which is not anywhere differentiable).
        
         | Bootvis wrote:
         | Doesn't seem wrong to me, the tricky part is to find this
         | network and convince yourself that it is indeed predicts
         | correctly over the period of interest.
        
         | karelp wrote:
         | Your reasoning is not wrong, there is a neural net that
         | approximates future stock price. The problem is we don't know
         | which one :)
        
           | hedora wrote:
           | Sure we do. Partition your neurons into a few billion
           | independent networks, embed them on a spinning globe of
           | mostly molten rock, and put each one in a leaky bag of mostly
           | water.
           | 
           | Wait long enough, and one leaky bag will emit the number
           | "42". If you get the initial conditions just right (and
           | quantum nondeterminism isn't really a thing), then you'll
           | also get a good approximation to the stock market.
        
           | vimacs2 wrote:
           | So it's somewhat akin to the Library of Babel but instead of
           | the set of all possible books, it's the set of all possible
           | functions :p
        
             | hedora wrote:
             | Those are equivalent as long as you allow for infinite
             | length books.
        
               | Banana699 wrote:
               | No need for infinite-length books to encode all possible
               | functions, as there are notations to express infinity in
               | finite space (e.g. programs, encoding an infinity of
               | behaviour in a finite number of instructions).
               | 
               | The library of babel contain every possible finite
               | description of every possible function.
        
           | dzaima wrote:
           | And knowing which one it is will probably influence which one
           | it should be, in a halting-problem-esque way.
        
         | pawelduda wrote:
         | It would be the same as predicting the future, which is not
         | possible using past performance
        
         | callesgg wrote:
         | It is not wrong.
         | 
         | It is just that the neural network would have to compute a
         | model of the entire world or even the universe on an atomic
         | scale. It would be computationally unfeasible but not
         | theoretically impossible.
         | 
         | It is theoretically possible that the universe we live is
         | already being computed on a neural network in some other
         | external universe.
        
       | gowld wrote:
       | This is a long written argument with some animations, not a
       | visual proof.
        
       | hprotagonist wrote:
       | any "well behaved" function.
        
         | qwerty1793 wrote:
         | A number of assumptions seem to be missing from this article.
         | Since the author is using the sigmoid function which is smooth,
         | this argument actually only applies to approximating smooth
         | functions. That is, you dont just need f to be continuous, you
         | need all of its derivatives to exist and all be continuous.
         | Also, since we are only able to have finitely many neurons, we
         | need to be able to approximate f using step functions with
         | definitely many pieces. So this argument can only be used if f
         | is constant outside of a compact region.
        
           | hprotagonist wrote:
           | at which point, a fourier transform is a hell of a lot
           | cheaper ;)
        
           | joppy wrote:
           | Why does the function you are approximating need to be
           | smooth? From the paper cited in the article, all you need is
           | for f to be continuous on a compact subset of R^n.
        
       | lisper wrote:
       | > a more precise statement of the universality theorem is that
       | neural networks with a single hidden layer can be used to
       | approximate any continuous function to any desired precision.
       | 
       | That is a _very_ different claim than being able to compute any
       | function.
        
         | moffkalast wrote:
         | I was about to feed P = NP into GPT-3, _sigh_
        
       ___________________________________________________________________
       (page generated 2022-03-06 23:00 UTC)