[This is the little research on AI & Neural Networks made by me]
Abstract: Artificial Intelligence
(AI) is the key technology in many of today's novel applications, ranging from
banking systems that detect attempted credit card fraud, to telephone systems
that understand speech, to software systems that notice when you're having
problems and offer appropriate advice. The aim of this
paper is to give a broad use of AI where Neural Network (NN) is a subset of AI.
Now standing as a major paradigm for data mining applications, Neural Networks
have been widely used in many fields due to its ability to capture complex
patterns present in the data. The development of Neural Networks has been so
rapid that they are now referred as the sixth generation of computing. This
paper explains the distinct mechanisms embodied in Neural Networks, its
strengths, weaknesses and applications.
Keywords: Artificial
Intelligence, Neural Networks, Learning methods, Neuron, Neuroscience.
I. Introduction
Artificial Intelligence (AI):
The branch of computer science concerned with making computers behave like humans. The term was
coined in 1956 by John McCarthy at the Massachusetts Institute of Technology.
Artificial intelligence includes:
Games playing: programming computers to play games such as chess and checkers.
Expert
systems : programming
computers to make decisions in real-life situations. For example, some expert systems help
doctors diagnose diseases based on symptoms).
Natural
language : programming
computers to understand natural human languages.
Neural
networks : Systems that simulate intelligence by attempting to reproduce
the types of physical connections that occur in animal brains.
Robotics : programming computers to see and hear and
react to other sensory stimuli.
Considered as a
subset of Artificial Intelligence, Neural
Network (NN) basically constitutes a computer program designed to learn in
a manner similar to the human brain. Rather than using a digital model, in which
all computations manipulate zeros and ones, a neural network works by creating
connections between processing elements, the computer equivalent of
neurons. The organization and weights of the connections determine the output.
II. Inspiration from Neuroscience
Fig 1. Schematic diagram of typical Neuron
|
Artificial
Neural Network (ANN) or simply Neural Network (NN) was originally aimed more
towards modeling networks of real neurons in the brain.
Neuron: The brain is composed of about 1011 neurons
(nerve cells) of many different types. Fig. 1 depicts a schematic drawing of a
neuron and its connection with other neuron. Tree-like networks of nerve fiber
called dendrites are connected to the
cell body or soma, where the nucleus
is located. Extending from the cell body is a single long fiber called the axon, which eventually branches into
strands or sub-strands. At the ends of these are the transmitting ends of the synaptic junctions or synapses to other nucleus. The
receiving ends of these junctions on other cells can be found both on
the dendrites and on the cell bodies themselves. The axon of a typical neuron
makes a few thousand synapses with other neurons. The transmission of a signal from one cell to
another at a synapse is a complex chemical process in which specific
transmitter substances are released from the sending side of the junction. The
effect is to raise or lower the electrical potential inside the body of the
receiving cell. If this potential reaches a threshold, a pulse or action
potential of fixed strength and duration is sent down the axon. We then say that
the cell has fired.
III. From Biological to Artificial Neuron Model
Fig. 2 Artificial Neuron
|
The neuron model
shown in Fig. 2 is the one that widely used in artificial neural networks with
some minor modifications on it. It has N inputs, denoted as I1, I2,…,
IN. Each line connecting these inputs to the neuron is assigned
a weight, which are denoted as W1, W2, .., WN respectively. The neuron calculates a
weighted sum of inputs and compares it to a threshold (T). If the sum is higher
than the threshold, the output is set to 1, otherwise to -1. The weight Wj
represents the strength of the synapse connecting previous neuron to this
neuron. It can be positive or negative corresponding to an excitatory or
inhibitory synapse respectively. It is zero if there is no synapse between two
adjacent neurons.
IV. Network Architecture
The simplest
form of NN consists of only two layers, the input and output layer (no hidden
layer is present). This is sometimes referred to as the skip layer, which
basically constitutes a conventional linear
regression modeling in a NN design whereby the input layer is directly
connected to the output layer, hence bypassing the hidden layer. Like any other
network, this simplest form of NN relies on weight as the connection between an
input and the output; the weight representing the relative significance of a
specific input in the computation of the output. However, based on the fact that the hidden layer confers strong
learning ability to the NN, in practical applications, a three and above three
NN architecture is used. Fig. 3 shows the three layers in the network.
Fig. 3 Three layers in the network
|
An infinite
number of ways prevail as to the construction of a NN; neurodynamics (basically
spells out the properties of an individual neuron such as its transfer function
and how the inputs are combined) and architecture (defines the structure of NN
including the number of neurons in each layer and the number of types of
interconnections) are two terms used to describe the way in which a NN is
organized. Any network designer must factor in the following elements when
building up a network:
1. Best starting
values (weight initialization)
2. Number of
hidden layers
3. Number of
neurons in each hidden layer
4. Number of
input variables or combination of input variables (usually emanating from regression
analysis)
5. Learning rate
6. Momentum rate
7. Training time
or amount of training (i.e., the number of iterations to employ)
8. Type of
activation function to use in the hidden and output layers
9. Data partitioning and evaluation metrics
The design of a
network is considered an art rather than a science. This is why the design of a
network is a time consuming process. Nevertheless, the main criterion used for
the design of NN is to end up with the specification that minimizes the errors;
or the optimal network topology.
V. Learning
Neural networks are configured for a specific application, such as
pattern recognition or data classification, through a learning process. In a
biological system, learning involves adjustments to the synaptic connections
between neurons.
Learning methods:
- Unsupervised
- Reinforcement learning
- Back-propagation
1) Unsupervised
Learning: It does not require help from the outside. It doesn’t require training
data, no information available on the desired output. Principle is learning
by doing. Used to pick out structure in the input:
- Clustering
- Reduction of dimensionality: compression
2) Reinforcement learning: Reinforcement learning is a form of supervised
learning because the network gets some feedback from its environment. Reinforcement
learning is sometimes called learning
with a critic as opposed to learning
with a teacher. If the reinforcement signal says that a particular output
is wrong, it gives no hint as what the right answer should be. It is therefore
important in a reinforcement learning network for there to be some source of
randomness in the network, so that the space of possible outputs can be
explored until a correct value is found. It uses the performance score to shuffle
weights randomly. Relatively slow due to randomness.
3) Back-propagation: Error = difference between actual & desired output.
Weight is changed relative to error size. Calculate output layer error
then propagate back to previous layer. Advantage if this method is performance
improves.
VI. Benefits of
Neural Networks
1) Its structure massively distributed in
parallel. The information processing takes place through the iteration of a
great amount of computational neurons, each one of them send exciting or
inhibiting signals to other nodes in the network. Differing from other classic
AI methods where the information processing can be considered sequential – this
is step by step even when there is not a predetermined order – in the Neural
Networks this process is essentially in parallel, which is the origin of its
flexibility. Because the calculations are divided in many nodes, if any of them
gets astray from the expected behavior it does not affect the behavior of the
network.
2) Its ability to learn and generalize. The NNs
have the capability to acquire knowledge from its surroundings by the
adaptation of its internal parameters, which is produced as a response to the
presence of an external stimulus. The network learns from the examples which
are presented to it and generalizes knowledge from them.
3) No
linearity: The answer from the
computational neuron can be linear or not. A neural network formed by the
interconnection of non-linear neurons, is in itself non-linear, a trait which
is distributed to the entire network. No linearity is important over all in the
cases where the task to develop presents a behavior removed from linearity,
which is presented in most of real situations.
4) Adaptive
Learning: The NN is capable to determine the relationship between the
different examples which are presented to it, or to identify the kind to which
belong, without requiring a previous model.
5) Self – Organization: This property allows the NN to distribute the
knowledge in the entire network structure; there is no element with specific
stored information.
6) Fault tolerance: This
characteristic is shown in two senses: The first is related to the samples
shown to the network, in which case it answers correctly even when the examples
exhibit variability or noise; the second, appears when in any of the elements
of the network occurs a failure, which does not impossibilitate its functioning
due to the way in which it stores information.
VII. Limitations of Artificial Neural Networks
Artificial
neural network is undoubtedly a powerful tool
for decision making. But there are several weaknesses in its use.
1) ANN is not a
general-purpose problem solver: It is good at
complex numerical computation for the purposes of solving system of linear or
non-linear equations, organizing data into equivalent classes, and adapting the
solution model to environmental changes. However, it is not good at such
mundane tasks as calculating payroll, balancing checks, and generating
invoices. Neither is it good at logical Inference- a job
suited for expert systems. Therefore, users must know when a problem could
be solved with an ANN.
2) There is no
structured methodology available: For choosing, developing, training, and
verifying an ANN. The solution quality of an ANN is known to be
affected by the number of layers, the number of neurons at each layer, the
transfer function of each neuron, and the size of the training set.
One would think that the more data in the training set, the better the
accuracy of the output. But, this is not so. While too small a training
set will prohibit the network from developing generalized
patterns of the inputs, too large a one will break down the generalized
patterns and make the network sensitive to input noise. In any case, the
selection of these parameters is more of an art than a science.
Users of ANNs must conduct experiments (or sensitivity analyses) to
identify the best possible configuration of the network. This calls for
easy-touse and easy-to-modify ANN development tools that are
gradually appearing on the market.
3) There is no
single standardized paradigm for ANN development: Because of its
interdisciplinary nature, there have been duplicating efforts spent on
ANN research. For example, the back-propagation learning algorithm
was independently developed by three groups of researchers in different
times: Werbos, Parker, and Rumelhart, Hinton, and Williams. To
resolve this problem, the ANN community should establish a repository of
available paradigms to facilitate knowledge transfer between
researchers. Moreover, to make an ANN work, it must be tailored
specifically to the problem it is intended to solve. To do so, users of ANN
must select a particular paradigm as the starting prototype. However,
there are many possible paradigms. Without a proper training, users may
easily get lost in this. Fortunately, most of the ANN development tools
commercially available today provide scores of sample paradigms that
work on various classes of problems. A user may follow the advice and
tailor it to his or her own needs.
4)
The output quality of an ANN may be unpredictable: Regardless of how well it was designed
and implemented. This may not be the case for finding the solution to a
problem with linear constraints in which the solution, if found, is guaranteed
to be the global optimum. However, many problems have a non-linear region
of feasible solutions. A solution to a non-linear problem reached
by the ANN may not be the global optimum. Moreover, there is no way to
verify that an ANN is correct unless every possible input is tried: such
exhaustive testing is impractical, if not impossible. In a mission-critical
application, one should develop ANN solutions in parallel with the
conventional ones for direct comparison. Both types of systems should
be run for a period of time, long enough to make sure that the ANN
systems are error-free before they are used in real situations.
5)
Most ANN systems are not able to explain how they solve problem:. The current ANN implementations are
based primarily on random collectivity between processing elements (the
individual "neurons"). As a result, the user may be able
to verify a network's output but not to trace a system's flow of
control. Recently, S.I.
Gallant demonstrated that an explanation ability can be incorporated
into an ANN. Further development of this is bound to attract more
prospective users into the ANN bandwagon.
VIII. Applications of
Neural Networks
1)Prediction: learning from past
experience -
Pick
the best stocks in the market
Predict weather
Identify
people with cancer risk
2) Classification
Image processing
Predict bankruptcy for credit card companies
Risk assessment
3) Recognition
Pattern recognition: SNOOPE (bomb detector in
U.S. airports)
Character recognition
Handwriting: processing checks
4) Data association
In the self –
association problems from partial information complete information is
recovered. The hetero – association consists in recovering an element from a
group B, given an element from a group A.
e.g. Not only identify the characters that were
scanned but identify when the scanner is not working properly
5) Data Conceptualization
Infer
grouping relationshipse.g. extract from a database the names of those most likely to buy a particular product
6) Data Filtering
In the next 10 years
technologies in narrow fields such as speech recognition will continue to
improve and will reach human levels. In 10 years AI and NN will be able to
communicate with human in unstructured English using text or voice, navigate in
an unprepared environment and will have some rudimentary common sense and
domain – specific intelligence.
e.g. take the noise out of a telephone signal,
signal smoothing
IX. Future
Scope
X. Conclusion
This paper
provided a simplified approach to Artificial Intelligence and Neural Networks
along with explanations on its different components. Neural Networks have
gained so much ground that they are now termed as the sixth generation of
computing. As a matter of fact, Neural Networks have been applied in many
fields such as science, finance, credit risk, economics and econometrics. The
predictive power of NN cannot be denied and this is making it still one of the
best forecasting tools not only among practitioners.
References
[1] Introduction to Theory of Neural
Computation – Hertz, Keogh, Palmer
[2] Artificial Intelligence- A modern
Approach – Stuart Russell and
Peter
Norvig
[3] Artificial Neural Networks- EE 543
Lecture Notes
[4] Wikipedia Website:
http://www.wikipedia.org/
No comments:
Post a Comment