Maxwell Yao's picture

10

Maxwell Yao

MaxwellJryao

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 12 hours ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

upvoted a paper 3 months ago

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

upvoted a paper 7 months ago

Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models

View all activity

Organizations

Papers 2

arxiv:2504.11343

arxiv:2502.13131

models 35

MaxwellJryao/sft_loraMoE_-lora-sft_Qwen2-1.5B_lr-1e-3

Updated Sep 6, 2024

MaxwellJryao/sft_loraMoE_wiki_hop_original_choose_best_object_affirmative_1-lora-sft_Qwen2-1.5B_lr-1e-3

Updated Sep 5, 2024

MaxwellJryao/sft_wiki_hop_original_choose_best_object_affirmative_1-lora-sft_Qwen2-1.5B_lr-1e-3

Updated Aug 12, 2024

MaxwellJryao/sft_race_high_Select_the_best_answer-lora-sft_Qwen2-1.5B_lr-1e-3

Updated Aug 12, 2024

MaxwellJryao/sft_openbookqa_main_pick_using_id-lora-sft_Qwen2-1.5B_lr-1e-3

Updated Aug 12, 2024

MaxwellJryao/sft_imdb_Reviewer_Opinion_bad_good_choices-lora-sft_Qwen2-1.5B_lr-1e-3

Updated Aug 12, 2024

MaxwellJryao/sft_super_glue_boolq_yes_no_question-lora-sft_Qwen2-1.5B_lr-1e-3

Updated Aug 12, 2024 • 1

MaxwellJryao/sft_piqa_pick_correct_choice_index-lora-sft_Qwen2-1.5B_lr-1e-3

Updated Aug 12, 2024 • 2

MaxwellJryao/sft_sciq_Multiple_Choice-lora-sft_Qwen2-1.5B_lr-1e-3

Updated Aug 12, 2024

MaxwellJryao/sft_ai2_arc_ARC_Challenge_pick_the_most_correct_option-lora-sft_Qwen2-1.5B_lr-1e-3

Updated Aug 12, 2024

datasets 1

MaxwellJryao/choices_3

Viewer • Updated Jul 4, 2024 • 99.8k • 29