-
Notifications
You must be signed in to change notification settings - Fork 17
Case study blog of DeepFabric using spin #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
As discussed, a blog providing an overview of how we use spin to allow real tool executions within dataset traces Signed-off-by: Luke Hinds <lukehinds@gmail.com>
Signed-off-by: Luke Hinds <lukehinds@gmail.com>
itowlson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @lukehinds - a good read and an intriguing use case! I noted a couple of possible typos or things to check, but otherwise looks good! If the things I've flagged are non-concerns then let me know and we get get this out there!
| author = "Luke Hinds" | ||
| --- | ||
|
|
||
| As the world exhausts its supply of original training data, synthetic data has become not just useful but necessary for continued AI training. Yet this shift to synthetics brings it's challenges - particularly for training models to effectively use Tools and conform with structured schema output. When both Tool calls and their responses are generated by an LLM, the resulting models consistently underperform against real systems. They struggle with error recovery, mishandle state dependencies, and often exhibit what we call "time travel" errors: acting on information they haven't actually received yet (e.g., skipping verification steps because they "know" a file exists). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo fix
| As the world exhausts its supply of original training data, synthetic data has become not just useful but necessary for continued AI training. Yet this shift to synthetics brings it's challenges - particularly for training models to effectively use Tools and conform with structured schema output. When both Tool calls and their responses are generated by an LLM, the resulting models consistently underperform against real systems. They struggle with error recovery, mishandle state dependencies, and often exhibit what we call "time travel" errors: acting on information they haven't actually received yet (e.g., skipping verification steps because they "know" a file exists). | |
| As the world exhausts its supply of original training data, synthetic data has become not just useful but necessary for continued AI training. Yet this shift to synthetics brings its challenges - particularly for training models to effectively use tools and conform with structured schema output. When both tool calls and their responses are generated by an LLM, the resulting models consistently underperform against real systems. They struggle with error recovery, mishandle state dependencies, and often exhibit what we call "time travel" errors: acting on information they haven't actually received yet (e.g., skipping verification steps because they "know" a file exists). |
(I wasn't sure about the "tool" ones in case the capital was a reference to a specific role/model. But the rest of the post lower-cases it so I bet on typo grin - if that's wrong then please ignore.)
|
|
||
| | Property | Docker | WebAssembly | | ||
| |----------|--------|-------------| | ||
| | Filesystem access | Must explicitly restrict | Denied by default | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the Docker filesystem and network lines in this correct? My vague impression of Docker is that guests do not have access to the host filesystem or network by default. But maybe I'm wrong? And/or misunderstanding what the table is saying?
|
|
||
| The sandbox rejection isn't just a safety feature - it's valuable training data. Models learn that certain paths are off-limits and how to recover when access is denied and find a more appropriate method to achieve their goals. | ||
|
|
||
| This becomes more pressing as we start to witness SOTA frontier models exhibiting dangerous behaviors, such as attempting to delete a users entire home directory when given file system access. WebAssembly's default-deny posture ensures that any such attempts are safely blocked, while also providing informative error feedback for training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SOTA = state-of-the-art? Consider spelling it out?
|
|
||
| ### Polyglot Language Support | ||
|
|
||
| And of course, this being webassembly, you can build components in any language that compiles to Wasm, including Javascript, Go, and Python. The Spin framework handles the HTTP interface and capability restrictions uniformly across languages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| And of course, this being webassembly, you can build components in any language that compiles to Wasm, including Javascript, Go, and Python. The Spin framework handles the HTTP interface and capability restrictions uniformly across languages. | |
| And of course, this being WebAssembly, you can build components in any language that compiles to Wasm, including Javascript, Go, and Python. The Spin framework handles the HTTP interface and capability restrictions uniformly across languages. |
| @@ -0,0 +1,410 @@ | |||
| title = "DeepFabric and Spin: A Case Study in Building Better Agentic Training Data" | |||
| date = "2025-12-27T14:23:15Z" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we've taken a while to review this (sorry - holidays), do you want to bump the date forward? (This will show as the article date now.)
As discussed, a blog providing an overview of how we use spin to allow real tool executions within dataset traces.