How does IT do it?: Artificial Intelligence & Neural Networks

[This is the little research on AI & Neural Networks made by me]

Abstract: Artificial Intelligence (AI) is the key technology in many of today's novel applications, ranging from banking systems that detect attempted credit card fraud, to telephone systems that understand speech, to software systems that notice when you're having problems and offer appropriate advice. The aim of this paper is to give a broad use of AI where Neural Network (NN) is a subset of AI. Now standing as a major paradigm for data mining applications, Neural Networks have been widely used in many fields due to its ability to capture complex patterns present in the data. The development of Neural Networks has been so rapid that they are now referred as the sixth generation of computing. This paper explains the distinct mechanisms embodied in Neural Networks, its strengths, weaknesses and applications.

Keywords: Artificial Intelligence, Neural Networks, Learning methods, Neuron, Neuroscience.

I. Introduction

Artificial Intelligence (AI): The branch of computer science concerned with making computers behave like humans. The term was coined in 1956 by John McCarthy at the Massachusetts Institute of Technology. Artificial intelligence includes:

Games playing: programming computers to play games such as chess and checkers.

Expert systems : programming computers to make decisions in real-life situations. For example, some expert systems help doctors diagnose diseases based on symptoms).

Natural language : programming computers to understand natural human languages.

Neural networks : Systems that simulate intelligence by attempting to reproduce the types of physical connections that occur in animal brains.

Robotics : programming computers to see and hear and react to other sensory stimuli.

Considered as a subset of Artificial Intelligence, Neural Network (NN) basically constitutes a computer program designed to learn in a manner similar to the human brain. Rather than using a digital model, in which all computations manipulate zeros and ones, a neural network works by creating connections between processing elements, the computer equivalent of neurons. The organization and weights of the connections determine the output.

II. Inspiration from Neuroscience

Fig 1. Schematic diagram of typical Neuron

Artificial Neural Network (ANN) or simply Neural Network (NN) was originally aimed more towards modeling networks of real neurons in the brain.

Neuron: The brain is composed of about 10¹¹neurons (nerve cells) of many different types. Fig. 1 depicts a schematic drawing of a neuron and its connection with other neuron. Tree-like networks of nerve fiber called dendrites are connected to the cell body or soma, where the nucleus is located. Extending from the cell body is a single long fiber called the axon, which eventually branches into strands or sub-strands. At the ends of these are the transmitting ends of the synaptic junctions or synapses to other nucleus. The receiving ends of these junctions on other cells can be found both on the dendrites and on the cell bodies themselves. The axon of a typical neuron makes a few thousand synapses with other neurons. The transmission of a signal from one cell to another at a synapse is a complex chemical process in which specific transmitter substances are released from the sending side of the junction. The effect is to raise or lower the electrical potential inside the body of the receiving cell. If this potential reaches a threshold, a pulse or action potential of fixed strength and duration is sent down the axon. We then say that the cell has fired.

III. From Biological to Artificial Neuron Model

Fig. 2 Artificial Neuron

The neuron model shown in Fig. 2 is the one that widely used in artificial neural networks with some minor modifications on it. It has N inputs, denoted as I_1,I₂,…, I_N. Each line connecting these inputs to the neuron is assigned a weight, which are denoted as W₁, W₂, .., W_N respectively. The neuron calculates a weighted sum of inputs and compares it to a threshold (T). If the sum is higher than the threshold, the output is set to 1, otherwise to -1. The weight W_jrepresents the strength of the synapse connecting previous neuron to this neuron. It can be positive or negative corresponding to an excitatory or inhibitory synapse respectively. It is zero if there is no synapse between two adjacent neurons.

IV. Network Architecture

The simplest form of NN consists of only two layers, the input and output layer (no hidden layer is present). This is sometimes referred to as the skip layer, which basically constitutes a conventional linear regression modeling in a NN design whereby the input layer is directly connected to the output layer, hence bypassing the hidden layer. Like any other network, this simplest form of NN relies on weight as the connection between an input and the output; the weight representing the relative significance of a specific input in the computation of the output. However, based on the fact that the hidden layer confers strong learning ability to the NN, in practical applications, a three and above three NN architecture is used. Fig. 3 shows the three layers in the network.

Fig. 3 Three layers in the network

An infinite number of ways prevail as to the construction of a NN; neurodynamics (basically spells out the properties of an individual neuron such as its transfer function and how the inputs are combined) and architecture (defines the structure of NN including the number of neurons in each layer and the number of types of interconnections) are two terms used to describe the way in which a NN is organized. Any network designer must factor in the following elements when building up a network:

1. Best starting values (weight initialization)

2. Number of hidden layers

3. Number of neurons in each hidden layer

4. Number of input variables or combination of input variables (usually emanating from regression analysis)

5. Learning rate

6. Momentum rate

7. Training time or amount of training (i.e., the number of iterations to employ)

8. Type of activation function to use in the hidden and output layers

9. Data partitioning and evaluation metrics

The design of a network is considered an art rather than a science. This is why the design of a network is a time consuming process. Nevertheless, the main criterion used for the design of NN is to end up with the specification that minimizes the errors; or the optimal network topology.

V. Learning

Neural networks are configured for a specific application, such as pattern recognition or data classification, through a learning process. In a biological system, learning involves adjustments to the synaptic connections between neurons.

Learning methods:

Unsupervised
Reinforcement learning
Back-propagation

1) Unsupervised Learning: It does not require help from the outside. It doesn’t require training data, no information available on the desired output. Principle is learning by doing. Used to pick out structure in the input:

Clustering
Reduction of dimensionality: compression

2) Reinforcement learning: Reinforcement learning is a form of supervised learning because the network gets some feedback from its environment. Reinforcement learning is sometimes called learning with a critic as opposed to learning with a teacher. If the reinforcement signal says that a particular output is wrong, it gives no hint as what the right answer should be. It is therefore important in a reinforcement learning network for there to be some source of randomness in the network, so that the space of possible outputs can be explored until a correct value is found. It uses the performance score to shuffle weights randomly. Relatively slow due to randomness.

3) Back-propagation: Error = difference between actual & desired output.

Weight is changed relative to error size. Calculate output layer error then propagate back to previous layer. Advantage if this method is performance improves.

VI. Benefits of Neural Networks

1) Its structure massively distributed in parallel. The information processing takes place through the iteration of a great amount of computational neurons, each one of them send exciting or inhibiting signals to other nodes in the network. Differing from other classic AI methods where the information processing can be considered sequential – this is step by step even when there is not a predetermined order – in the Neural Networks this process is essentially in parallel, which is the origin of its flexibility. Because the calculations are divided in many nodes, if any of them gets astray from the expected behavior it does not affect the behavior of the network.

2) Its ability to learn and generalize. The NNs have the capability to acquire knowledge from its surroundings by the adaptation of its internal parameters, which is produced as a response to the presence of an external stimulus. The network learns from the examples which are presented to it and generalizes knowledge from them.

3) No linearity: The answer from the computational neuron can be linear or not. A neural network formed by the interconnection of non-linear neurons, is in itself non-linear, a trait which is distributed to the entire network. No linearity is important over all in the cases where the task to develop presents a behavior removed from linearity, which is presented in most of real situations.

4) Adaptive Learning: The NN is capable to determine the relationship between the different examples which are presented to it, or to identify the kind to which belong, without requiring a previous model.

5) Self – Organization: This property allows the NN to distribute the knowledge in the entire network structure; there is no element with specific stored information.

6) Fault tolerance: This characteristic is shown in two senses: The first is related to the samples shown to the network, in which case it answers correctly even when the examples exhibit variability or noise; the second, appears when in any of the elements of the network occurs a failure, which does not impossibilitate its functioning due to the way in which it stores information.

VII. Limitations of Artificial Neural Networks

Artificial neural network is undoubtedly a powerful tool for decision making. But there are several weaknesses in its use.

1) ANN is not a general-purpose problem solver:It is good at complex numerical computation for the purposes of solving system of linear or non-linear equations, organizing data into equivalent classes, and adapting the solution model to environmental changes. However, it is not good at such mundane tasks as calculating payroll, balancing checks, and generating invoices. Neither is it good at logical Inference- a job suited for expert systems. Therefore, users must know when a problem could be solved with an ANN.

2) There is no structured methodology available: For choosing, developing, training, and verifying an ANN. The solution quality of an ANN is known to be affected by the number of layers, the number of neurons at each layer, the transfer function of each neuron, and the size of the training set. One would think that the more data in the training set, the better the accuracy of the output. But, this is not so. While too small a training set will prohibit the network from developing generalized patterns of the inputs, too large a one will break down the generalized patterns and make the network sensitive to input noise. In any case, the selection of these parameters is more of an art than a science. Users of ANNs must conduct experiments (or sensitivity analyses) to identify the best possible configuration of the network. This calls for easy-touse and easy-to-modify ANN development tools that are gradually appearing on the market.

3) There is no single standardized paradigm for ANN development: Because of its interdisciplinary nature, there have been duplicating efforts spent on ANN research. For example, the back-propagation learning algorithm was independently developed by three groups of researchers in different times: Werbos, Parker, and Rumelhart, Hinton, and Williams. To resolve this problem, the ANN community should establish a repository of available paradigms to facilitate knowledge transfer between researchers. Moreover, to make an ANN work, it must be tailored specifically to the problem it is intended to solve. To do so, users of ANN must select a particular paradigm as the starting prototype. However, there are many possible paradigms. Without a proper training, users may easily get lost in this. Fortunately, most of the ANN development tools commercially available today provide scores of sample paradigms that work on various classes of problems. A user may follow the advice and tailor it to his or her own needs.

4) The output quality of an ANN may be unpredictable: Regardless of how well it was designed and implemented. This may not be the case for finding the solution to a problem with linear constraints in which the solution, if found, is guaranteed to be the global optimum. However, many problems have a non-linear region of feasible solutions. A solution to a non-linear problem reached by the ANN may not be the global optimum. Moreover, there is no way to verify that an ANN is correct unless every possible input is tried: such exhaustive testing is impractical, if not impossible. In a mission-critical application, one should develop ANN solutions in parallel with the conventional ones for direct comparison. Both types of systems should be run for a period of time, long enough to make sure that the ANN systems are error-free before they are used in real situations.

5) Most ANN systems are not able to explain how they solve problem:. The current ANN implementations are based primarily on random collectivity between processing elements (the individual "neurons"). As a result, the user may be able to verify a network's output but not to trace a system's flow of control. Recently, S.I. Gallant demonstrated that an explanation ability can be incorporated into an ANN. Further development of this is bound to attract more prospective users into the ANN bandwagon.

VIII. Applications of Neural Networks

1)Prediction: learning from past experience -

Pick the best stocks in the market

Predict weather

Identify people with cancer risk

2) Classification

Image processing

Predict bankruptcy for credit card companies

Risk assessment

3) Recognition

Pattern recognition: SNOOPE (bomb detector in U.S. airports)

Character recognition

Handwriting: processing checks

4) Data association

In the self – association problems from partial information complete information is recovered. The hetero – association consists in recovering an element from a group B, given an element from a group A.

e.g. Not only identify the characters that were scanned but identify when the scanner is not working properly

5) Data Conceptualization

Infer grouping relationships
e.g. extract from a database the names of those most likely to buy a particular product

6) Data Filtering

e.g. take the noise out of a telephone signal, signal smoothing

IX. Future Scope

In the next 10 years technologies in narrow fields such as speech recognition will continue to improve and will reach human levels. In 10 years AI and NN will be able to communicate with human in unstructured English using text or voice, navigate in an unprepared environment and will have some rudimentary common sense and domain – specific intelligence.

X. Conclusion

This paper provided a simplified approach to Artificial Intelligence and Neural Networks along with explanations on its different components. Neural Networks have gained so much ground that they are now termed as the sixth generation of computing. As a matter of fact, Neural Networks have been applied in many fields such as science, finance, credit risk, economics and econometrics. The predictive power of NN cannot be denied and this is making it still one of the best forecasting tools not only among practitioners.

References

[1] Introduction to Theory of Neural

Computation – Hertz, Keogh, Palmer

[2] Artificial Intelligence- A modern

Approach – Stuart Russell and Peter

Norvig

[3] Artificial Neural Networks- EE 543

Lecture Notes

[4] Wikipedia Website:

http://www.wikipedia.org/

How does IT do it?

Saturday, 11 May 2013

Artificial Intelligence & Neural Networks

IV. Network Architecture

VII. Limitations of Artificial Neural Networks

No comments:

Post a Comment