---
library_name: transformers
license: apache-2.0
base_model: albert/albert-xlarge-v2
tags:
- generated_from_trainer
model-index:
- name: ec8e82b2b44eaa30abdf045a6f91b52d
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# ec8e82b2b44eaa30abdf045a6f91b52d

This model is a fine-tuned version of [albert/albert-xlarge-v2](https://huggingface.co/albert/albert-xlarge-v2) on the nyu-mll/glue dataset.
It achieves the following results on the evaluation set:
- Loss: 2.4541
- Data Size: 1.0
- Epoch Runtime: 13.4551
- Mse: 2.4549
- Mae: 1.3095
- R2: -0.0982

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50

### Training results

| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Mse    | Mae    | R2      |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:-------------:|:------:|:------:|:-------:|
| No log        | 0     | 0    | 7.0598          | 0         | 1.5330        | 7.0610 | 2.2267 | -2.1586 |
| No log        | 1     | 179  | 3.4001          | 0.0078    | 1.9432        | 3.4011 | 1.5444 | -0.5215 |
| No log        | 2     | 358  | 2.5721          | 0.0156    | 1.9087        | 2.5728 | 1.3266 | -0.1509 |
| No log        | 3     | 537  | 2.5621          | 0.0312    | 2.0639        | 2.5628 | 1.3228 | -0.1465 |
| No log        | 4     | 716  | 3.3451          | 0.0625    | 2.4991        | 3.3457 | 1.4775 | -0.4967 |
| No log        | 5     | 895  | 2.7295          | 0.125     | 3.2172        | 2.7302 | 1.3552 | -0.2213 |
| 0.1514        | 6     | 1074 | 2.3139          | 0.25      | 4.6207        | 2.3147 | 1.2901 | -0.0354 |
| 2.1915        | 7     | 1253 | 2.3530          | 0.5       | 7.6602        | 2.3538 | 1.2955 | -0.0529 |
| 2.1701        | 8.0   | 1432 | 2.5341          | 1.0       | 13.6003       | 2.5348 | 1.3225 | -0.1339 |
| 2.2039        | 9.0   | 1611 | 2.3139          | 1.0       | 13.3889       | 2.3147 | 1.2901 | -0.0354 |
| 2.238         | 10.0  | 1790 | 2.2843          | 1.0       | 13.5145       | 2.2851 | 1.2865 | -0.0222 |
| 2.1154        | 11.0  | 1969 | 2.2954          | 1.0       | 13.3767       | 2.2962 | 1.2876 | -0.0272 |
| 2.0953        | 12.0  | 2148 | 2.4696          | 1.0       | 13.4115       | 2.4703 | 1.3121 | -0.1051 |
| 2.1473        | 13.0  | 2327 | 2.2897          | 1.0       | 13.2635       | 2.2905 | 1.2867 | -0.0246 |
| 2.2298        | 14.0  | 2506 | 2.2760          | 1.0       | 13.3458       | 2.2768 | 1.2866 | -0.0185 |
| 2.1942        | 15.0  | 2685 | 2.6864          | 1.0       | 13.3188       | 2.6871 | 1.3466 | -0.2020 |
| 2.2101        | 16.0  | 2864 | 2.3054          | 1.0       | 13.3623       | 2.3061 | 1.2888 | -0.0316 |
| 2.1232        | 17.0  | 3043 | 2.3875          | 1.0       | 13.3129       | 2.3882 | 1.2996 | -0.0683 |
| 2.1782        | 18.0  | 3222 | 2.4541          | 1.0       | 13.4551       | 2.4549 | 1.3095 | -0.0982 |


### Framework versions

- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.1