Serialized Function(since 2.0)

Introduction

You often needs to provide a custom function if you are working with DCF. Some of them will be serialized and transfer to master/worker, and run on a different process. It will be ok if you don't use any upvalues(variables/constants from enclosing scopes), but if you needs to use closure, you should read this article first.

Functions will be serialized as source string, then deserialized with new Function.

const min = 5;
// ERROR: This will failed to run:
console.log(await rdd.filter(
  v => v >= min,    // Function that uses upvalues.
).collect());

Here come's a problem: how does worker know the upvalue min?

Manually capture env

One solution is to capture upvalues by calling captureEnv API manually:

const { captureEnv } = require('@dcfjs/common');

const min = 5;
console.log(await rdd.filter(captureEnv(
  v => v >= min,    // Function that uses upvalues.
  { min }           // Function Env that contains upvalues with same name.
)).collect());

If you want some API from other libraries, you can use requireModule to require them for serialized function:

const { captureEnv, requireModule} = require('@dcfjs/common');
const isPrime = require('is-prime');

console.log(await rdd.filter(captureEnv(
  v => isPrime(v),
  { isPrime: requireModule('is-prime') }
)).collect());

Please be sure that module is also available on master/worker.

If you serialize a function inside a serialized function(you should rarely meet this if you do not use DCF HTTP/2 API), please be careful by using captureEnv, they should be required by requireModule:

const dcfCommon = require('@dcfjs/common');

const i = 1;
await rdd.client.request(dcfCommon.captureEnv(
  async dispatchWorker => {
    await dispatchWorker(
      dcfCommon.captureEnv(() => {
        console.log(i);
      }, {
        i
      })
    );
  }
  { 
    i,
    dcfCommon: dcfCommon.requireModule('@dcfjs/common')
  }
));

Register auto capturing

if manually capture upvalue is too complex, you can require a helper module registerCaptureEnv, which will capture all upvalues automaticly for every function, with very low cost, and does not effect things if you didn't serialized functions.

Hint: You should require @dcfjs/common/registerCaptureEnv before any your modules was loaded. You can either write a loader module, or require it from command line:

node -r @dcfjs/common/registerCaptureEnv main.js
mocha -r @dcfjs/common/registerCaptureEnv test/**/*.js
const min = 5;
console.log(await rdd.filter(
  v => v >= min,    // Function that uses upvalues.
).collect());

You should not destruct third-party modules before using them in serailized function:

// Donot destructure before use like this:
// const { gzipSync } = require('zlib');
const zlib = require('zlib');

await numbers.filter(isPrime).saveAsTextFile('./prime', {
  extension: 'gz',
  compressor: (buf: Buffer) => {
    return zlib.gzipSync(buf);
  },
});

But if you use ECMAScript 6 Modules via typescript or babel, it's safe to import and use function:

import { gzipSync } from '@dcfjs/common'

await numbers.filter(isPrime).saveAsTextFile('./prime', {
  extension: 'gz',
  compressor: (buf: Buffer) => {
    return gzipSync(buf);
  },
});

Last updated