Legesher Native Code Submission
Submit native-language Python code and verify its output
Fine-tuning a multilingual model on native-language code (Spanish, Chinese, Urdu).
Submit native-language Python code and verify its output
Note Submit native-language code (Spanish, Chinese, Urdu, other lower-resourced langs) to be added to our human-authored dataset — tests signal that machine translation misses.
Note Training-data tiers (5K / 20K / 103K) from bigcode/the-stack-v2-dedup. Raw + Legesher-transpiled + LLM-translated variants share source files; cond-3 differs by design.
Note Source-of-truth. Per-cell tables (cells, vs_baseline, significance), framework views, full Phase 3 writeup. Cite the refined-extractor figures.
Note QLoRA adapters on Tiny Aya, one per (condition × volume × seed). Cond-1 (en), Cond-2 (Legesher), Cond-3 (real-world), Cond-5 (Aya-translated). 5K = 3 seeds; rest = 1.
Note Repo for community-contributed native-language code.