Introduction
In simple terms, a machine learning model is a mathematical function that maps input data to predicted outputs. More specifically, a machine learning model adjusts its parameters by learning from training data to minimize the error between predicted outputs and true labels.
There are many types of models in machine learning, such as logistic regression, decision trees, and support vector machines. Each model has suitable data types and problem domains. Different models also share many commonalities and can be linked through transformations or extensions.
For example, starting from the connectionist perceptron, adding hidden layers transforms it into a deep neural network. Introducing a kernel function to a perceptron-like structure leads to SVM. These transformations illustrate intrinsic relationships among models. For convenience, the models are roughly categorized into the following six classes to highlight basic commonalities and analyze each class in depth.
1. Neural Network (Connectionist) Models
Connectionist models simulate brain neural network structures and functions. Their basic unit is the neuron, which receives inputs from other neurons and adjusts weights to change input influence. Neural networks are often treated as black boxes; with multiple nonlinear hidden layers they can approximate a wide range of functions.
Representative models include DNN, SVM, Transformer, and LSTM. In some cases, the final layer of a deep neural network can be viewed as a logistic regression model used for classification. Support vector machines can be regarded as a special type of neural architecture with essentially two layers—input and output—while the kernel function enables complex nonlinear transformations similar to deep networks.
Deep neural networks (DNN) consist of multiple layers of neurons. Through forward propagation, input data is passed through each layer and transformed to produce outputs. Each layer receives outputs from the previous layer as input and forwards its output to the next layer. DNN training uses the backpropagation algorithm: compute the error between the output layer and true labels, propagate the error backward through the layers, and update weights and biases via gradient descent. Iterating this process optimizes the network parameters to minimize prediction error.
DNN advantages include strong feature learning capability and high model expressiveness without manual feature design. Disadvantages include large parameter counts that can lead to overfitting, high computational cost, long training times, and limited interpretability. Example Python code using Keras to build a simple DNN:
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from keras.losses import BinaryCrossentropy
import numpy as np
# Build model
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(10,))) # Input layer has 10 features
model.add(Dense(64, activation='relu')) # Hidden layer has 64 neurons
model.add(Dense(1, activation='sigmoid')) # Output layer has 1 neuron, using sigmoid activation for binary classification
# Compile model
model.compile(optimizer=Adam(lr=0.001), loss=BinaryCrossentropy(), metrics=['accuracy'])
# Generate synthetic dataset
x_train = np.random.rand(1000, 10) # 1000 samples, each with 10 features
y_train = np.random.randint(2, size=1000) # 1000 labels for binary classification
# Train model
model.fit(x_train, y_train, epochs=10, batch_size=32) # Train for 10 epochs with batch size 32
2. Symbolic Models
Symbolic models are based on logical reasoning and treat cognition as symbol manipulation. They encode information as identifiable symbols and apply explicit rules to manipulate those symbols using rule bases and inference engines. Representative systems include expert systems, knowledge bases, and knowledge graphs. A simple expert system example:
# Define rule base
rules = [ {"name": "rule1", "condition": "sym1 == 'A' and sym2 == 'B'", "action": "result = 'C'"},
{"name": "rule2", "condition": "sym1 == 'B' and sym2 == 'C'", "action": "result = 'D'"},
{"name": "rule3", "condition": "sym1 == 'A' or sym2 == 'B'", "action": "result = 'E'"},]
# Define inference engine
def infer(rules, sym1, sym2):
for rule in rules:
if rule["condition"] == True: # Execute action when condition is true
return rule["action"]
return None # Return None when no rule satisfies conditions
# Test expert system
print(infer(rules, 'A', 'B')) # Output: C
print(infer(rules, 'B', 'C')) # Output: D
print(infer(rules, 'A', 'C')) # Output: E
print(infer(rules, 'B', 'B')) # Output: E
3. Decision Tree Models
Decision trees are nonparametric methods for classification and regression, representing decisions via a tree structure. Mathematically, tree models can be seen as piecewise functions. They use entropy and related measures from information theory to select optimal split attributes and construct trees with good classification performance.
Decision trees recursively split the dataset into subsets until each subset belongs to a single class or meets stopping criteria. Splits are evaluated using measures such as information gain, gain ratio, or Gini index to choose the best attribute. Classic algorithms include ID3, C4.5, and CART. ID3 uses information gain; C4.5 improves ID3 by using gain ratio and pruning; CART uses the Gini index and can handle continuous and ordinal attributes.
Example using scikit-learn to build a CART decision tree:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Build decision tree model
clf = DecisionTreeClassifier(criterion='gini')
clf.fit(X_train, y_train)
# Predict test set
y_pred = clf.predict(X_test)
# Visualize decision tree
plot_tree(clf)
4. Probabilistic Models
Probabilistic models use probability theory to describe distributions of random phenomena and their relationships. They employ probability distributions for random variables and conditional probability rules to model dependencies. Probabilistic models support quantitative analysis and prediction of stochastic events.
Representative models include naive Bayes classifiers, Bayesian networks, and hidden Markov models. Naive Bayes and logistic regression both rely on probabilistic reasoning; hidden Markov models and Bayesian networks model sequences and dependencies among variables.
Example using scikit-learn to implement a Gaussian naive Bayes classifier:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Build Gaussian Naive Bayes classifier
clf = GaussianNB()
clf.fit(X_train, y_train)
# Predict test set
y_pred = clf.predict(X_test)
5. Nearest Neighbor Models
Nearest neighbor models are nonparametric methods for classification and regression based on instance-based learning. They determine similarity between data points by measuring distances in feature space. They do not require explicit model training in the conventional sense.
The K-nearest neighbors (KNN) algorithm classifies a sample based on the majority label among its k nearest training samples in feature space. Variants include weighted KNN, multi-level classification KNN, radius search, K-means, and approximate nearest neighbor (ANN) algorithms.
ANN methods trade some accuracy for improved time and space efficiency when searching nearest neighbors in large datasets by allowing approximate results to reduce search time.
Example using scikit-learn to implement KNN:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Build KNN classifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
# Predict test set
y_pred = knn.predict(X_test)
6. Ensemble Learning Models
Ensemble learning is a methodology that combines multiple learners to improve overall prediction accuracy and stability by aggregating their outputs. The core idea is that combining multiple base learners can reduce variance and bias, improving generalization. Introducing diversity among base learners—via different algorithms, data, or parameters—often improves ensemble performance.
Common ensemble methods include bagging, boosting, and stacking. Bagging introduces diversity and reduces variance to improve stability and generalization. Boosting changes the weighting of base learners to improve performance. Stacking arranges base learners into a layered structure and uses a meta-learner to integrate their predictions.
Representative ensemble models include random forest, isolation forest, GBDT, AdaBoost, and XGBoost. Example using scikit-learn to implement a random forest classifier:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Build random forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
# Predict test set
y_pred = clf.predict(X_test)
Conclusion
By grouping models with similar principles into categories, we can systematically explore their mechanisms and interconnections to gain a more comprehensive understanding of model behavior and relationships.