gusdelact/penguins-species-curated
Viewer • Updated • 344 • 49
How to use gusdelact/penguins-decision-tree with Scikit-learn:
from huggingface_hub import hf_hub_download
import joblib
model = joblib.load(
hf_hub_download("gusdelact/penguins-decision-tree", "sklearn_model.joblib")
)
# only load pickle files from sources you trust
# read more about it here https://skops.readthedocs.io/en/stable/persistence.htmlModelo de clasificación multiclase para identificar especies de pingüinos (Adelie, Gentoo, Chinstrap) a partir de mediciones físicas observables en el campo.
DecisionTreeClassifier (scikit-learn)Species (3 clases: Adelie Penguin (Pygoscelis adeliae), Chinstrap penguin (Pygoscelis antarctica), Gentoo penguin (Pygoscelis papua)).{
"criterion": "gini",
"max_depth": 5,
"min_samples_leaf": 1,
"min_samples_split": 5
}
Búsqueda con GridSearchCV(cv=StratifiedKFold(5, shuffle=True, random_state=42), scoring="f1_weighted") sobre el grid:
{
"max_depth": [
3,
4,
5,
6,
8,
10,
null
],
"min_samples_split": [
2,
5,
10
],
"min_samples_leaf": [
1,
2,
5
],
"criterion": [
"gini",
"entropy"
]
}
| Métrica | Valor |
|---|---|
| accuracy | 0.9884 |
| f1_weighted | 0.9883 |
| f1_macro | 0.9856 |
| precision_weighted | 0.9887 |
| recall_weighted | 0.9884 |
| Clase | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| Adelie Penguin (Pygoscelis adeliae) | 0.974 | 1.000 | 0.987 | 38 |
| Chinstrap penguin (Pygoscelis antarctica) | 1.000 | 0.941 | 0.970 | 17 |
| Gentoo penguin (Pygoscelis papua) | 1.000 | 1.000 | 1.000 | 31 |
import joblib
import pandas as pd
from huggingface_hub import hf_hub_download
REPO = "gusdelact/penguins-decision-tree"
model = joblib.load(hf_hub_download(REPO, "model.joblib"))
preprocessor = joblib.load(hf_hub_download(REPO, "preprocessor.joblib"))
label_encoder = joblib.load(hf_hub_download(REPO, "label_encoder.joblib"))
# Datos de un individuo (mediciones de campo)
sample = pd.DataFrame([{
"Culmen Length (mm)": 50.0,
"Culmen Depth (mm)": 16.3,
"Flipper Length (mm)": 220.0,
"Body Mass (g)": 5700.0,
"Island": "Biscoe",
"Sex": "MALE",
"Clutch Completion": "Yes",
}])
X = preprocessor.transform(sample)
pred = model.predict(X)
print(label_encoder.inverse_transform(pred)) # -> ['Gentoo penguin (Pygoscelis papua)']
Diseño documentado en notes/01_design_fe.md, notes/02_design_modeling.md,
notes/03_design_validation.md. Decisiones clave:
class_weight="balanced" para compensar el desbalance leve (Chinstrap
≈ 20%) operando dentro del criterio de Gini [ESL §9.2 Multiclass loss].criterion ∈ {gini, entropy} porque ambos
son numéricamente similares pero no idénticos en presencia de desbalance
[ISLP §8.1; ESL §9.2.4].RandomForestClassifier.@model{penguins-decision-tree,
author = {gusdelact},
title = {penguins-species-classifier — DecisionTreeClassifier},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/gusdelact/penguins-decision-tree}
}