XGBoost in Python

August 14, 2024

XGBoost in Python

Here’s a basic example of how to use in for a classification task.

This example uses the popular .

Install XGBoost

If you don’t have installed, you can install it using pip:

1	pip install xgboost

Sample Code

import xgboost as xgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert the dataset into DMatrix, which is XGBoost's internal data structure
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Set parameters for XGBoost
params = {
    'objective': 'multi:softmax',  # Specify the loss function
    'num_class': 3,  # Number of classes in the dataset
    'max_depth': 3,  # Maximum depth of a tree
    'eta': 0.3,  # Step size shrinkage
    'eval_metric': 'mlogloss'  # Evaluation metric
}

# Train the model
num_rounds = 50  # Number of boosting rounds
bst = xgb.train(params, dtrain, num_rounds)

# Make predictions on the test set
y_pred = bst.predict(dtest)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

Explanation

Loading the Dataset: We use the , which is a common dataset for classification tasks.
It contains three classes of flowers.
Splitting the Data: The dataset is split into training and testing sets using train_test_split.
DMatrix: uses its own data structure called DMatrix for training.
It is more efficient and optimized for operations.
Setting Parameters:
- objective: Defines the learning task and the corresponding objective function.
  In this case, multi:softmax is used for multiclass classification.
- num_class: Specifies the number of classes.
- max_depth: The maximum depth of the trees.
- eta: The learning rate.
Training: The model is trained using the train function with the specified parameters and number of boosting rounds.
Prediction: The trained model makes predictions on the test set, and the accuracy is calculated using accuracy_score.

This is a basic example, but offers a wide range of parameters and options that can be fine-tuned for different types of data and tasks.

Output

1	Accuracy: 100.00%