Literature Review of Deep Machine Learning for feature extraction

Abstract

Feature extraction is a crucial part of many tasks. In recent years, deep learning approaches have achieved significant importance as a way of constructing hierarchical representations from unlabeled data. Understanding how to recognize complex, high-dimensional data is one of the greatest requests of our time. Deep Machine Learning have showed us that there is an efficient and accurate method of recognition and classification of data either in supervised or unsupervised learning process. In this literature review there will be presented the latest Deep Machine Learning architectures and a number of different problems that solved by them.

Keywords: Deep Machine Learning, Neural Networks, Deep Learning, Convolutional Networks, Feature Extraction

Literature Review of Deep Machine Learning for feature extraction

Understanding the essence of data is the key to building good representations. In fields such as natural images, the data comes from very complex distributions that are difficult to obtain as it mentioned in (Lee, Grosse, Ranganath, & Ng, 2009). Feature learning aims to discoverapproximate those underlying distributions and use their information to weed out unrelated information, preserving most of the relevant information. Trying to mimic the efficiency by which the human brain describes information has been an essence challenge in artificial intelligence research.

Humans are exposed to innumerable of sensory data received every moment and somehow can capture critical aspects of them in a way that enables for future use. Over 50 years ago, Richard Bellman, proposed dynamic programming theory and established the field of optimal control, declared that high dimensionality of data is a fundamental barrier in many applications. The main difficulty that appears, especially in the context of pattern classification applications, is that the learning complexity rises exponentially with a linear progression in the dimensionality of the data (Bellman, 1957) .

Deep Learning is an innovative field of Machine Learning analysis, which has introduced with the objective of driving Machine Learning closer to one of its primary goals: Artificial Intelligence. Various deep learning architectures (DLA) such as deep neural networks (DNN), convolutional deep belief neural networks (CDNN), and deep belief networks (DBN) have been applied to many fields, where they have been shown to produce state-of-the-art results.

Deep Belief Neural Networks

Deep belief networks are probabilistic generative models that are composed of multiple layers of stochastic, possible variables. The potential variables typically have values and are often called hidden units (Hinton, Osindero, & Teh, 2006). The top layers have undirected, symmetric connections in between, in order to “create” an associative memory. The lower layer(s) take topdown, directed connections from the previous layer. The states of the blocks in the lowest layer describe a data vector. This probabilistic generative model stands in contrast to the discriminative nature of traditional neural nets. DBNs are formed of several layers of Restricted Boltzmann Machines (type of neural network).

These networks are “restricted” to a single visible layer and single hidden layer. The hidden units are trained to obtain higher-order data correlations that are seen at the visible units. Initially, apart from the top two layers, which produce an associative memory, the layers of a DBN are connected by controlled top-down generative weights. It has been shown in (Hinton & Salakhutdinov, 2006) that before-mentioned networks usually perform much better than those trained solely with back-propagation (traditional NN). It is explained by the fact that backpropagation for DBNs is required to perform a local search on the weight (parameter) space, speeding training and convergence time in relation to traditional feed-forward neural networks.

Convolutional Neural Networks

Convolutional neural network (CNN) is a type of feed-forward neural network where each neuron is tiled in such a way that it responds to overlapping regions in the visual field (Sohn, Jung, Lee, & Hero, 2011). CNN(s) are a class of multi-layer neural networks designed for use on two-dimensional data, such as signals, images and videos. CNNs are inspired by earlier work in timedelay neural networks (TDNN), which reduce computation requirements by distributing weights. CNNs are a successful deep learning method where many layers of the hierarchy are successfully trained in a robust way. Convolutional neural network is a choice of topology or design that supports spatial relationships to reduce the amount of parameters that must learn and thus improves upon general feed-forward back-propagation training (Huang & LeCun, 2006).

Convolutional Deep Belief Networks

The adaptability of DBNs was recently expanded (Lee, Grosse, Ranganath, & Ng, 2009) by proposing the idea of Convolutional Deep Belief Networks (CDBNs). DBNs do not natively embed information about the 2D structure of the input, i.e. inputs are just vectorized formats of a matrix (LeCum, Bottou, Bengio, & Haffner, 1988). In contradiction, CDBNs use the spatial relationship of neighboring data with the introduction of what are termed convolutional RBMs to present a translation invariant generative model that scales well with high dimensional data. Convolutional Deep Belief Network (CDBN) is a type of deep artificial neural network that is composed of multiple layers of convolutional restricted Boltzmann machines stacked together (Yuanfang & Yan, 2014). CDBNs are using the technique of probabilistic max-pooling in order to reduce the dimensions in higher layers inside the network. Training of such a network is accomplished in a greedy layer-wise manner. CDBNs were proposed as a deep learning architecture that is motivated by minimum requirements of data preprocessing (Hinton & Salakhutdinov, 2006).

Deep Learning Applications

There have been several studies demonstrating the effectiveness of deep learning methods in a variety of application domains (Huang & LeCun, 2006). Deep Machine Learning methods are often used in image recognition systems. They have achieved very low error rates on a various Image Databases (The MNIST database of handwritten digits, 2014) (Fukushima, 2003). There are also reports the learning process of Deep Machine Learning methods was “surprisingly fast” (Hinton, Osindero, & Teh, 2006). However, this is not the only field that DMLM have been used. In EEG (Yuanfang & Yan, 2014), signal processing is reported that the use of those methods also had very interesting results in contrast to conventional algorithms. Moreover in audio processing – speech recognition (Sukittanon,, Surendran,, Platt, & Burges, 2004), DMLM provide better performance and accuracy. Consequently, interest in deep machine learning has not been limited to academic research. Recently, the Defense Advanced Research Projects Agency (DARPA) has announced a research program exclusively focused on deep learning. Several private organizations, have focused their attention on commercializing deep learning technologies with applications to broad domains. In 2011, Google’s Google Brain project; which created an NN trained with deep learning algorithms, which proved capable of recognizing high-level concepts after watching just YouTube videos. In 2012, Facebook computer scientist Yann LeCun, used deep learning expertise to help create solutions that will identify faces and objects in 350 million photos and videos uploaded to Facebook each day. Another example of deep learning in action is voice recognition like Google Now. Much of this work was based in Dahl Yu paper (Dahl, Yu, Deng, & Acero, 2013) represented a huge breakthrough in deep learning speech recognition.

Future of Deep Machine Learning and Feature Extraction

Deep machine learning is an active area of research. But remains a great deal of work in order to improve the learning process, where current focus is on lending fertile ideas from other areas of machine learning, particularly in the context of dimensionality reduction. Some of the core questions that require immediate attention include: How well does a particular scheme scale with respect to the dimensionality of the input? What is an efficient framework for capturing both short and long term temporal dependencies? How can multimodal sensory information be most naturally fused within a given architectural framework? What are the correct attention mechanisms that can be used to augment a given deep learning technology so as to improve robustness and invariance to distorted or missing data? How well do the various solutions map to parallel processing platforms that facilitate processing speed up? (Sohn, Jung, Lee, & Hero, 2011) While deep learning has been successfully applied to challenging pattern inference tasks, the goal of the field is far beyond task-specific applications. This scope may make the comparison of various methodologies increasingly complex and will likely necessitate a collaborative effort by the research community to address. It should be noted that, despite the great prospect offered by deep learning technologies, some domain-specific tasks may not be directly improved by such schemes. Despite the innumerable of open research issues and the fact that the field is new. It is abundantly clear that advancements made with respect to developing deep machine learning systems will undoubtedly shape the future of machine learning and artificial intelligence systems in general.

References

Bellman, R. (1957). Dynamic Programming. NJ: Princento Univ. Press.

Dahl, G., Yu, D., Deng, L., & Acero, A. (2013). Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20, 30-42.

Fukushima, K. (2003). Neocognitron for handwritten digit recognition. Neurocomputing, 51, 161-180.

Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507.

Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Comput, 18, 1527-1554.

Huang, F., & LeCun, Y. (2006). Large-scale learning with SVM and convolutional nets for generic object categorization. Computer Vision and Pattern Recognition.

LeCum, Y., Bottou, L., Bengio, Y., & Haffner, P. (1988). Gradient-based learning applied to document recognition. Proc. IEEE, 86(11), 2278-2324.

Lee, H., Grosse, R., Ranganath, R., & Ng, A. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. 26th Int. Conf. Machine Learning, (pp. 609-616).

Sohn, K., Jung, D. Y., Lee, H., & Hero, A. O. (2011). Efficient learning of sparse, distributed, convolutional feature representations for object recognition. 13th International Conference on Computer Vision(ICCV), (pp. 2643-2650).

Sukittanon,, S., Surendran,, A. C., Platt, J. C., & Burges, J. C. (2004). Convolutional networks for speech detection. Interspeech, 1077-1080. The MNIST database of handwritten digits. (2014, 11 20). Retrieved from http://yann.lecun.com/exdb/mnist/

Yuanfang, R., & Yan, W. (2014). Convolutional Deep Belief Networks for Feature Extraction of EEG Signal. IEEE.