Spaces:

Traders-Lab
/

README

Running

App Files Files Community

maddes8cht commited on 15 days ago

Commit

baf3b14

1 Parent(s): 5274b2e

Add .gitattributes entry for .jpg files and create history.md for project background

Browse files

Files changed (2) hide show

.gitattributes +1 -0
history.md +60 -0

.gitattributes CHANGED Viewed

@@ -7,6 +7,7 @@
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text

 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
+*.jpg filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text

history.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+title: README
+emoji: 📉
+colorFrom: yellow
+colorTo: red
+sdk: static
+pinned: false
+---
+[README](README.md) • ** A brief history** • [TroveLedger](/datasets/Traders-Lab/TroveLedger)
+# How TroveLedger came to be
+### A brief history
+Over the past year, TroveLedger has gradually taken shape — not as a sudden idea, but as the result of a series of practical needs becoming increasingly clear.
+After spending more time engaging seriously with financial markets, the idea emerged around the turn of the previous year to experiment with training custom AI trading models. Very quickly, one fundamental requirement became apparent: reliable, well-structured historical market data.
+Finding such data turned out to be the first real challenge.
+Existing datasets on Hugging Face were explored early on, but it became clear that many of them did not meet the requirements needed for long-term model training — whether due to gaps, inconsistent formats, limited history, or missing intraday resolution. Paid datasets were considered, but ultimately the decision was made to experiment with collecting data independently using yfinance.
+Some of the earliest datasets — particularly for commodities — went through multiple transformations and revisions. A few of them have not yet found their way into the current TroveLedger dataset, though they may still be integrated over time.
+By the end of May, a stable and consistent data format based on Parquet files had been established. From that point onward, the first stocks began accumulating continuously, gap-free, and in a uniform structure. What started as a small list of equities quickly grew to several hundred within a matter of weeks.
+At that stage, publishing the data as a public Hugging Face dataset became a natural step:
+* the data contained nothing private or proprietary,
+* it allowed easy access from different locations,
+* and it made collaboration possible, as the project had grown beyond a strictly solo effort.
+The dataset was initially released under a *preliminary* status. Somewhat unexpectedly to me, it began to attract a steady stream of external downloads, indicating genuine interest. After refinements to the data pipeline, this evolved into *Preliminary v2*, accompanied by a promise that this transitional phase would eventually be replaced by a stable, long-term dataset.
+That transition, however, took longer than anticipated. While the pipeline itself matured significantly during this time, the more substantial challenge turned out to be organizational rather than technical: curating which stocks should be included.
+One thing was always clear — and remains so today:
+TroveLedger does **not** aim to include “all stocks”.
+It is not a comprehensive finance database and not a competitor to yfinance. Instead, the goal is to provide long-accumulated minute-, hourly-, and daily-level time series for *interesting*, liquid, and relevant assets — data that is genuinely suitable for training AI models. Illiquid penny stocks, for example, offer little value in this context.
+This naturally led to the realization that such assets are already professionally curated elsewhere: in indices.
+The missing step, therefore, was extending the data pipeline to work with index component lists — collecting them, maintaining them, and systematically integrating their constituents. Once this was implemented, the size of the dataset grew rapidly. Adding everything at once proved impractical, both technically and logistically, which led to the decision to introduce new assets gradually, index by index.
+The first non-preliminary release of the dataset finally went live on December 17, 2025. By that point, *Preliminary v2* had already reached a stable level of well over 1,500 downloads per month.
+The current approach builds on that foundation: new stocks are added regularly in structured batches, each corresponding to a complete index and accompanied by a simple, readable component list.
+As the project gained more weight and consistency, it also gained a name: **TroveLedger**.
+Along with it came the idea of visually representing the project through a goblin figure — inspired by the meticulous vault-keepers of Gringotts — who serves as the fictional keeper of the ledger itself. Each newly added index is now accompanied by its own image, a small creative detail that reflects the personal nature of the project and helps give it a distinct identity.
+<img src="media/HappyNewYear.jpg"/>
+TroveLedger has since continued to grow through regular updates and the steady addition of further indices. With the turn of the year, a new phase begins. Over the past days, download numbers have increased noticeably and are on track to surpass those of *Preliminary v2*.
+The ledger remains open.
+---
+[README](README.md) • ** A brief history** • [TroveLedger](/datasets/Traders-Lab/TroveLedger)