Serialized Function(since 2.0)
Introduction
You often needs to provide a custom function if you are working with DCF. Some of them will be serialized and transfer to master/worker, and run on a different process. It will be ok if you don't use any upvalues(variables/constants from enclosing scopes), but if you needs to use closure, you should read this article first.
Functions will be serialized as source string, then deserialized with new Function
.
const min = 5;
// ERROR: This will failed to run:
console.log(await rdd.filter(
v => v >= min, // Function that uses upvalues.
).collect());
Here come's a problem: how does worker know the upvalue min
?
Manually capture env
One solution is to capture upvalues by calling captureEnv
API manually:
const { captureEnv } = require('@dcfjs/common');
const min = 5;
console.log(await rdd.filter(captureEnv(
v => v >= min, // Function that uses upvalues.
{ min } // Function Env that contains upvalues with same name.
)).collect());
If you want some API from other libraries, you can use requireModule
to require them for serialized function:
const { captureEnv, requireModule} = require('@dcfjs/common');
const isPrime = require('is-prime');
console.log(await rdd.filter(captureEnv(
v => isPrime(v),
{ isPrime: requireModule('is-prime') }
)).collect());
Please be sure that module is also available on master/worker.
If you serialize a function inside a serialized function(you should rarely meet this if you do not use DCF HTTP/2 API), please be careful by using captureEnv, they should be required by requireModule:
const dcfCommon = require('@dcfjs/common');
const i = 1;
await rdd.client.request(dcfCommon.captureEnv(
async dispatchWorker => {
await dispatchWorker(
dcfCommon.captureEnv(() => {
console.log(i);
}, {
i
})
);
}
{
i,
dcfCommon: dcfCommon.requireModule('@dcfjs/common')
}
));
Register auto capturing
if manually capture upvalue is too complex, you can require a helper module registerCaptureEnv
, which will capture all upvalues automaticly for every function, with very low cost, and does not effect things if you didn't serialized functions.
Hint: You should require
@dcfjs/common/registerCaptureEnv
before any your modules was loaded. You can either write a loader module, or require it from command line:node -r @dcfjs/common/registerCaptureEnv main.js mocha -r @dcfjs/common/registerCaptureEnv test/**/*.js
const min = 5;
console.log(await rdd.filter(
v => v >= min, // Function that uses upvalues.
).collect());
You should not destruct third-party modules before using them in serailized function:
// Donot destructure before use like this:
// const { gzipSync } = require('zlib');
const zlib = require('zlib');
await numbers.filter(isPrime).saveAsTextFile('./prime', {
extension: 'gz',
compressor: (buf: Buffer) => {
return zlib.gzipSync(buf);
},
});
But if you use ECMAScript 6 Modules
via typescript or babel, it's safe to import and use function:
import { gzipSync } from '@dcfjs/common'
await numbers.filter(isPrime).saveAsTextFile('./prime', {
extension: 'gz',
compressor: (buf: Buffer) => {
return gzipSync(buf);
},
});
Last updated