-
-
Notifications
You must be signed in to change notification settings - Fork 34.5k
Virtual File System for Node.js #61478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Review requested:
|
|
The
notable-change
Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section. |
|
Nice! This is a great addition. Since it's such a large PR, this will take me some time to review. Will try to tackle it over the next week. |
| */ | ||
| existsSync(path) { | ||
| // Prepend prefix to path for VFS lookup | ||
| const fullPath = this.#prefix + (StringPrototypeStartsWith(path, '/') ? path : '/' + path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use path.join?
| /** | ||
| * Gets the underlying VirtualFileSystem instance. | ||
| * @returns {VirtualFileSystem} | ||
| */ | ||
| get vfs() { | ||
| return this.#vfs; | ||
| } | ||
|
|
||
| /** | ||
| * Gets the mount prefix for the mock file system. | ||
| * @returns {string} | ||
| */ | ||
| get prefix() { | ||
| return this.#prefix; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be getters? Why can't we expose the actual values.
If a user overwrites them, they can if they wish?
| validateObject(files, 'options.files'); | ||
| } | ||
|
|
||
| const { VirtualFileSystem } = require('internal/vfs/virtual_fs'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we import this at the top level / lazy load it at the top level?
| ArrayPrototypePush(this.#mocks, { | ||
| __proto__: null, | ||
| ctx, | ||
| restore: restoreFS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| restore: restoreFS, | |
| restore: ctx.restore, |
nit
lib/internal/vfs/entries.js
Outdated
| * @param {object} [options] Optional configuration | ||
| */ | ||
| addFile(name, content, options) { | ||
| const path = this._directory.path + '/' + name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use path.join?
| let entry = current.getEntry(segment); | ||
| if (!entry) { | ||
| // Auto-create parent directory | ||
| const dirPath = '/' + segments.slice(0, i + 1).join('/'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use path.join
| let entry = current.getEntry(segment); | ||
| if (!entry) { | ||
| // Auto-create parent directory | ||
| const parentPath = '/' + segments.slice(0, i + 1).join('/'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path.join?
| } | ||
| } | ||
| callback(null, content); | ||
| }).catch((err) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| }).catch((err) => { | |
| }, (err) => { |
lib/internal/vfs/virtual_fs.js
Outdated
| const bytesToRead = Math.min(length, available); | ||
| content.copy(buffer, offset, readPos, readPos + bytesToRead); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primordials?
| } | ||
|
|
||
| callback(null, bytesToRead, buffer); | ||
| }).catch((err) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| }).catch((err) => { | |
| }, (err) => { |
|
Left an initial review, but like @Ethan-Arrowood said, it'll take time for a more in depth look |
|
It's nice to see some momentum in this area, though from a first glance it seems the design has largely overlooked the feedback from real world use cases collected 4 years ago: https://github.com/nodejs/single-executable/blob/main/docs/virtual-file-system-requirements.md - I think it's worth checking that the API satisfies the constraints that users of this feature have provided, to not waste the work that have been done by prior contributors to gather them, or having to reinvent it later (possibly in a breaking manner) to satisfy these requirements from real world use cases. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #61478 +/- ##
==========================================
- Coverage 89.80% 89.79% -0.02%
==========================================
Files 672 681 +9
Lines 203907 207293 +3386
Branches 39203 39692 +489
==========================================
+ Hits 183121 186139 +3018
- Misses 13113 13475 +362
- Partials 7673 7679 +6
🚀 New features to boost your workflow:
|
|
And why not something like OPFS aka whatwg/fs? const rootHandle = await navigator.storage.getDirectory()
await rootHandle.getFileHandle('config.json', { create: true })
fs.mount('/app', rootHandle) // to make it work with fs
fs.readFileSync('/app/config.json')OR const rootHandle = await navigator.storage.getDirectory()
await rootHandle.getFileHandle('config.json', { create: true })
fs.readFileSync('sandbox:/config.json')fs.createVirtual seems like something like a competing specification |
5e317de to
977cc3d
Compare
I generally prefer not to interleave with WHATWG specs as much as possible for core functionality (e.g., SEA). In my experience, they tend to perform poorly on our codebase and remove a few degrees of flexibility. (I also don't find much fun in working on them, and I'm way less interested in contributing to that.) On an implementation side, the core functionality of this feature will be identical (technically, it's missing writes that OPFS supports), as we would need to impact all our internal fs methods anyway. If this lands, we can certainly iterate on a WHATWG-compatible API for this, but I would not add this to this PR. |
|
Small prior art: https://github.com/juliangruber/subfs |
8d711c1 to
73c18cd
Compare
|
I also worked on this a bit on the side recently: Qard@73b8fc6 That is very much in chaotic ideation stage with a bunch of LLM assistance to try some different ideas, but the broader concept I was aiming for was to have a module.exports = new VirtualFileSystem(new LocalProvider())I intended for it to be extensible for a bunch of different interesting scenarios, so there's also an S3 provider and a zip file provider there, mainly just to validate that the model can be applied to other varieties of storage systems effectively. Keep in mind, like I said, the current state is very much just ideation in a branch I pushed up just now to share, but I think there are concepts for extensibility in there that we could consider to enable a whole ecosystem of flexible storage providers. 🙂 Personally, I would hope for something which could provide both read and write access through an abstraction with swappable backends of some variety, this way we could pass around these virtualized file systems like objects and let an ecosystem grow around accepting any generalized virtual file system for its storage backing. I think it'd be very nice for a lot of use cases like file uploads or archive management to be able to just treat them like any other readable and writable file system. |
just a bit off topic... but this reminds me of why i created this feature request: Would not lie, it would be cool if NodeJS also provided some type of static example that would only work in NodeJS (based on how it works internally) const size = 26
const blobPart = BlobFrom({
size,
stream (start, end) {
// can either be sync or async (that resolves to a ReadableStream)
// return new Response('abcdefghijklmnopqrstuvwxyz'.slice(start, end)).body
// return new Blob(['abcdefghijklmnopqrstuvwxyz'.slice(start, end)]).stream()
return fetch('https://httpbin.dev/range/' + size, {
headers: {
range: `bytes=${start}-${end - 1}`
}
}).then(r => r.body)
}
})
blobPart.text().then(text => {
console.log('a-z', text)
})
blobPart.slice(-3).text().then(text => {
console.log('x-z', text)
})
const a = blobPart.slice(0, 6)
a.text().then(text => {
console.log('a-f', text)
})
const b = a.slice(2, 4)
b.text().then(text => {
console.log('c-d', text)
})An actual working PoC (I would not rely on this unless it became officially supported by nodejs core - this is a hack) const blob = new Blob()
const symbols = Object.getOwnPropertySymbols(blob)
const blobSymbol = symbols.map(s => [s.description, s])
const symbolMap = Object.fromEntries(blobSymbol)
const {
kHandle,
kLength,
} = symbolMap
function BlobFrom ({ size, stream }) {
const blob = new Blob()
if (size === 0) return blob
blob[kLength] = size
blob[kHandle] = {
span: [0, size],
getReader () {
const [start, end] = this.span
if (start === end) {
return { pull: cb => cb(0) }
}
let reader
return {
async pull (cb) {
reader ??= (await stream(start, end)).getReader()
const {done, value} = await reader.read()
cb(done ^ 1, value)
}
}
},
slice (start, end) {
const [baseStart] = this.span
return {
span: [baseStart + start, baseStart + end],
getReader: this.getReader,
slice: this.slice,
}
}
}
return blob
}currently problematic to do: also need to handle properly clone, serialize & deserialize, if this where to be sent of to another worker - then i would transfer a MessageChannel where the worker thread asks main frame to hand back a transferable ReadableStream when it needs to read something. but there are probably better ways to handle this internally in core with piping data directly to and from different destinations without having to touch the js runtime? - if only getReader could return the reader directly instead of needing to read from the ReadableStream using js? |
| const kPath = Symbol('kPath'); | ||
|
|
||
| // FD range: 10000+ to avoid conflicts with real fds | ||
| const VFS_FD_BASE = 10_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively here... it might be interesting to use negative fds for vfs nodes. Not sure if practically possible but it would avoid the potential for collision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative fds aren't valid per unix rules and other third-party tools sometimes have issues with that. See yarnpkg/berry#4737.
| --> | ||
| * `path` {string} The virtual path for the file. | ||
| * `content` {string|Buffer|Function} The file content, or a function that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest not limiting this to Buffer... also support Uint8Array explicitly.
For "dynamic content", why not also accept sync/async iterables here so that the content can be provided by a stream or generator.
We might also want to support Blob and File here.
| Adds a virtual file. The `content` can be: | ||
| * A `string` or `Buffer` for static content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a string is provided, is there a default encoding?
| * `prefix` {string} The path prefix where the VFS will be mounted. | ||
| Mounts the VFS at a specific path prefix. All paths in the VFS become accessible | ||
| under this prefix. Only one mount point can be active at a time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this last sentence mean this will throw if it is called twice?
Having vfs.mount(...) return a disposable object that will unmount on disposal would be nice.
const myMount = vfs.mount(prefix);
myMount.unmount();
// or
{
using myMount = vfs.mount(prefix);
// ...
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, having vfs itself support Symbol.dispose would be nice (if it is already and I haven't read far enough thru then ignore this ;-) ...)
|
|
||
| const vfs = fs.createVirtual(); | ||
| vfs.addFile('/module.js', 'module.exports = "hello"'); | ||
| vfs.mount('/virtual'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good for the docs here to explain a bit about what happens if there's already an existing real file system path /virtual. Is it shadowed? Does mount error? etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it automatically shadows, then I can see a potential security hole in this. For instance, if I can mount a path that shadows any real file system path, then a module could obfuscate mounting over a critical system path and trick users into reading malicious data or writing data to something that intercepts the data and forwards it somewhere. Not a critical concern given our security model but something that would be worth documenting
| Enables overlay mode, where the VFS is checked first for all file system | ||
| operations. If a path exists in the VFS, it is used; otherwise, the operation | ||
| falls through to the real file system (if `fallthrough` is enabled). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add warnings to the docs here about how this could be abused to secretly shadow critical file system paths.
| * `path` {string} The path to check. | ||
| * Returns: {boolean} | ||
| Returns `true` if the VFS contains a file or directory at the given path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should document the behavior if "overlay" mode is enabled.
| added: REPLACEME | ||
| --> | ||
| * `path` {string} The path to remove. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should document the behavior when "overlay" mode is enabled.
| const mod = require('/virtual/module.js'); | ||
| ``` | ||
| #### `vfs.overlay()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd actually prefer this to be an option passed to createVirtual
| When the VFS is unmounted, `process.chdir()` and `process.cwd()` are restored | ||
| to their original implementations. | ||
| > **Note:** VFS hooks are not automatically shared with worker threads. Each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| > **Note:** VFS hooks are not automatically shared with worker threads. Each | |
| > VFS hooks are not automatically shared with worker threads. Each |
| console.log(myVfs.provider.readonly); // true | ||
| ``` | ||
|
|
||
| ### `vfs.mount(prefix)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it's duplicating the docs in fs.md a bit too much
|
|
||
| * {boolean} | ||
|
|
||
| Returns `true` if the provider supports symbolic links. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Symbolic link support needs more explanation. For instance, in "overlay" mode... would it be possible for a real file system symbolic link to target a vfs symbolic link and vis versa? I assume yes but that gets tricky especially when worker threads are involved... sometimes the link works other times it doesn't, etc.
lib/vfs.js
Outdated
| const { RealFSProvider } = require('internal/vfs/providers/real'); | ||
|
|
||
| // SEAProvider is lazy-loaded to avoid loading SEA bindings when not needed | ||
| let SEAProvider = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let SEAProvider = null; | |
| let SEAProvider; |
lib/vfs.js
Outdated
| let SEAProvider = null; | ||
|
|
||
| function getSEAProvider() { | ||
| if (SEAProvider === null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (SEAProvider === null) { | |
| if (SEAProvider === undefined) { |
| SEAProvider = class SEAProviderUnavailable { | ||
| constructor() { | ||
| throw new ERR_INVALID_STATE('SEAProvider can only be used in a Single Executable Application'); | ||
| } | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not objecting or suggesting that this be change but it something about this pattern just feels weird to me... Don't have a better suggestion tho
| this[kMounted] = false; | ||
| this[kOverlay] = false; | ||
| this[kFallthrough] = options.fallthrough !== false; | ||
| this[kModuleHooks] = options.moduleHooks !== false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit... would prefer strict type checking for these (e.g. validateBoolean).
| * Returns true if VFS falls through to real fs on miss. | ||
| * @returns {boolean} | ||
| */ | ||
| get fallthrough() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels a bit off that some of the bool properties are is* and others are not.
|
|
||
| // If virtual cwd is enabled and set, resolve relative to it | ||
| if (this[kVirtualCwdEnabled] && this[kVirtualCwd] !== null) { | ||
| const resolved = this[kVirtualCwd] + '/' + inputPath; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const resolved = this[kVirtualCwd] + '/' + inputPath; | |
| const resolved = `${this[kVirtualCwd]}/${inputPath}`; |
lib/internal/vfs/streams.js
Outdated
| * asynchronously via process.nextTick - matching real fs behavior. | ||
| * @private | ||
| */ | ||
| _openFile() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why _ prefix? Why not make this private or a top-level module local function?
| } | ||
| const { getAssetKeys } = internalBinding('sea'); | ||
| const keys = getAssetKeys() || []; | ||
| return keys.length > 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a separate method for this? getSeaVfs could likely just return null if SEA VFS is not available.
| ### Glob support | ||
| The VFS integrates with `fs.glob()`, `fs.globSync()`, and `fs/promises.glob()` | ||
| when mounted or in overlay mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, am I just missing it or is overlay mode not covered in the tests? I'm given how the file paths are being normalized and processed in the implementation, It's not clear if overlay works correctly on windows when there are drive letters in the path or win-style namespace prefixes.
|
|
||
| // readlinkSync should work (returns target path) | ||
| const target = vfs.readlinkSync('/virtual/broken'); | ||
| assert.strictEqual(target, '/nonexistent'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably also test that the target can be created after the symlink and, once it is created, the symlink starts working correctly.
| // VFS uses forward slashes internally regardless of platform. | ||
| if (sep === '\\') { | ||
| normalized = StringPrototypeReplaceAll(normalized, '\\', '/'); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to use the path module?
lib/internal/vfs/providers/memory.js
Outdated
| this.mtime = DateNow(); | ||
| this.ctime = DateNow(); | ||
| this.birthtime = DateNow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| this.mtime = DateNow(); | |
| this.ctime = DateNow(); | |
| this.birthtime = DateNow(); | |
| const now = DateNow(); | |
| this.mtime = now; | |
| this.ctime = now; | |
| this.birthtime = now; |
|
Left an initial round of feedback comments but to keep from it being too overwhelming I'll stop here and do a second pass after the comments are addressed/responded to. They don't all require changes if there are reasons, of course. |
- Add security considerations section warning about path shadowing risks - Document that VFS shadows real paths when mounted - Add symlink documentation explaining VFS-internal-only behavior - Clarify that only mount mode exists (no overlay mode) - Reorder synchronous methods alphabetically per doc conventions Addresses review comments from @jasnell regarding security documentation, overlay mode clarification, alphabetical ordering, and symlink behavior.
- Use undefined instead of null for lazy-loaded SEAProvider - Add validateBoolean for moduleHooks and virtualCwd options - Use template literal for path concatenation - Convert VirtualReadStream to use private class fields - Cache DateNow() result in MemoryEntry constructor Addresses review comments nodejs#18, nodejs#19, nodejs#21, nodejs#23, nodejs#24, nodejs#29.
Add overlay mode to VirtualFileSystem that only intercepts paths that exist in the VFS, allowing real filesystem access to fall through for non-mocked files. This enables use cases like the test runner mock fs where specific files are mocked while allowing access to real files. Changes: - Add `overlay` option to VirtualFileSystem constructor - Modify shouldHandle() to check file existence in overlay mode - Add `overlay` property getter - Update test runner mock to use VFS with overlay mode - Use path.dirname() instead of string manipulation in mock.js - Rename isMounted to mounted for consistency - Add security documentation for overlay mode risks - Add comprehensive tests for overlay mode including worker threads
A first-class virtual file system module (
node:vfs) with a provider-based architecture that integrates with Node.js's fs module and module loader.Key Features
Provider Architecture - Extensible design with pluggable providers:
MemoryProvider- In-memory file system with full read/write supportSEAProvider- Read-only access to Single Executable Application assetsVirtualProvider- Base class for creating custom providersStandard fs API - Uses familiar
writeFileSync,readFileSync,mkdirSyncinstead of custom methodsMount Mode - VFS mounts at a specific path prefix (e.g.,
/virtual), clear separation from real filesystemModule Loading -
require()andimportwork seamlessly from virtual filesSEA Integration - Assets automatically mounted at
/seawhen running as a Single Executable ApplicationFull fs Support - readFile, stat, readdir, exists, streams, promises, glob, symlinks
Example
SEA Usage
When running as a Single Executable Application, bundled assets are automatically available:
Public API
Disclaimer: I've used a significant amount of Claude Code tokens to create this PR. I've reviewed all changes myself.