dixisouls commited on
Commit
64d1f43
Β·
1 Parent(s): 1565a27

Updated model card

Browse files
Files changed (1) hide show
  1. README.md +103 -15
README.md CHANGED
@@ -5,37 +5,125 @@ tags:
5
  - pytorch
6
  - transformer
7
  - custom-model
 
 
 
 
8
  language:
9
  - en
10
  pipeline_tag: text-generation
 
11
  ---
12
 
13
- # VelocityLM - 2B Parameter Language Model
14
 
15
- A custom transformer model with 2B parameters trained for text generation.
16
 
17
- ## Model Details
18
 
19
- - **Parameters:** ~2 billion
20
- - **Architecture:** Custom Transformer with RoPE, RMSNorm, SwiGLU
21
- - **Context Length:** 2,048 tokens
22
- - **Tokenizer:** GPT-2 compatible
23
- - **Training:** Falcon RefinedWeb dataset
24
 
25
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ```python
 
 
 
28
  from transformers import AutoTokenizer
29
  import torch
30
 
31
- # Load tokenizer
32
  tokenizer = AutoTokenizer.from_pretrained("gpt2")
33
 
34
- # Load model (you'll need custom loading code)
35
- # See the Space implementation for details
36
  ```
37
 
38
- ## Files
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
- - config.json - Model configuration
41
- - pytorch_model.bin - Model weights
 
 
 
 
 
5
  - pytorch
6
  - transformer
7
  - custom-model
8
+ - rope
9
+ - rmsnorm
10
+ - swiglu
11
+ - from-scratch
12
  language:
13
  - en
14
  pipeline_tag: text-generation
15
+ library_name: pytorch
16
  ---
17
 
18
+ # VelocityLM πŸš€
19
 
20
+ A high-performance, custom transformer language model trained from scratch using modern architectural innovations. VelocityLM combines state-of-the-art techniques including RMSNorm, SwiGLU activation, and Rotary Position Embeddings (RoPE) to deliver efficient and scalable language modeling.
21
 
22
+ ## 🎯 Quick Links
23
 
24
+ - **πŸš€ Try the Model**: [Interactive Demo Space](https://huggingface.co/spaces/dixisouls/VelocityLM)
25
+ - **πŸ’» Source Code**: [GitHub Repository](https://github.com/dixisouls/VelocityLM)
 
 
 
26
 
27
+ ## πŸ—οΈ Model Architecture
28
+
29
+ VelocityLM features a custom transformer architecture optimized for performance and efficiency:
30
+
31
+ ### Model Specifications
32
+ - **Parameters**: ~2B parameters
33
+ - **Architecture**: Decoder-only transformer with causal attention
34
+ - **Hidden Size**: 2,048
35
+ - **Layers**: 24 transformer layers
36
+ - **Attention Heads**: 32 heads per layer
37
+ - **Vocabulary**: 50,257 tokens (GPT-2 tokenizer compatible)
38
+ - **Context Length**: 2,048 tokens
39
+ - **Intermediate Size**: 8,192 (4x hidden size)
40
+
41
+ ### πŸ”¬ Key Innovations
42
+
43
+ #### RMSNorm (Root Mean Square Normalization)
44
+ - Replaces LayerNorm for improved training stability and efficiency
45
+ - Better gradient flow compared to traditional normalization
46
+
47
+ #### SwiGLU Activation Function
48
+ - Gated Linear Unit with Swish activation
49
+ - Superior performance compared to standard ReLU/GELU for language modeling
50
+ - Enhanced expressivity and gradient flow
51
+
52
+ #### Rotary Position Embeddings (RoPE)
53
+ - Relative position encoding with rotational invariance
54
+ - Better extrapolation capabilities to longer sequences
55
+ - More efficient than learned absolute position embeddings
56
+
57
+ ## 🎯 Training Details
58
+
59
+ - **Dataset**: [Falcon RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) - high-quality web text
60
+ - **Training Steps**: 5,000+ completed
61
+ - **Optimization**: AdamW with cosine annealing schedule
62
+ - **Hardware**: Trained on 4x NVIDIA A100 (80GB) GPUs
63
+ - **Features**: Mixed precision (FP16), gradient checkpointing, distributed training
64
+
65
+ ## πŸš€ Usage
66
+
67
+ ### Basic Text Generation
68
 
69
  ```python
70
+ # Note: This model requires custom loading code
71
+ # See the GitHub repository for complete implementation
72
+
73
  from transformers import AutoTokenizer
74
  import torch
75
 
76
+ # Load tokenizer (GPT-2 compatible)
77
  tokenizer = AutoTokenizer.from_pretrained("gpt2")
78
 
79
+ # For complete usage examples and model loading:
80
+ # Visit: https://github.com/dixisouls/VelocityLM
81
  ```
82
 
83
+ ### Interactive Demo
84
+ Try the model immediately in our [Hugging Face Space](https://huggingface.co/spaces/dixisouls/VelocityLM) - no setup required!
85
+
86
+ ## πŸ“Š Performance Features
87
+
88
+ ### Generation Strategies
89
+ - Greedy decoding for deterministic output
90
+ - Top-k and top-p (nucleus) sampling
91
+ - Temperature control for creativity adjustment
92
+ - Repetition penalty to reduce repetitive text
93
+
94
+ ### Memory Optimizations
95
+ - Gradient checkpointing (40% memory reduction)
96
+ - Efficient causal attention implementation
97
+ - Streaming data processing
98
+
99
+ ## πŸ”§ Technical Implementation
100
+
101
+ This model implements several cutting-edge techniques:
102
+
103
+ - **Distributed Training**: Multi-GPU support with PyTorch DDP
104
+ - **Mixed Precision**: FP16 training with automatic loss scaling
105
+ - **Advanced Scheduling**: Cosine annealing with warm restarts
106
+ - **Memory Efficiency**: Gradient checkpointing and parameter grouping
107
+
108
+
109
+ ## πŸ› οΈ Installation & Setup
110
+
111
+ For detailed installation instructions, training scripts, and advanced usage:
112
+
113
+ **πŸ‘‰ Visit the [GitHub Repository](https://github.com/dixisouls/VelocityLM)**
114
+
115
+ The repository includes:
116
+ - Complete training pipeline
117
+ - Inference utilities
118
+ - Configuration management
119
+ - Multi-GPU training support
120
+ - Comprehensive documentation
121
+
122
+ ## πŸ“ˆ Roadmap
123
 
124
+ Future enhancements planned:
125
+ - Flash Attention 2.0 integration
126
+ - Extended context length support (4K+)
127
+ - Model quantization for efficient deployment
128
+ - Fine-tuning capabilities for downstream tasks
129
+ - ONNX export for production inference