DCF.js
  • DCF.js Overview
  • Programming Guide
    • Quick Start
    • Data Serializing
    • Serialized Function(since 2.0)
  • API Refernce
    • Context
    • SerializedFunction(until 1.x)
    • RDD<T>
  • Resources
    • Source @GitHub
    • Samples @GitHub
Powered by GitBook
On this page
  • V8.serialize()/V8.deserialize()
  • Key Serialize
  1. Programming Guide

Data Serializing

PreviousQuick StartNextSerialized Function(since 2.0)

Last updated 6 years ago

Data in RDD may be serialized in serveral cases:

  1. To persist data on disk or off-heap memory.

  2. To transfer data for repartition

  3. To transfer result to master & client

  4. To get a hash value for combineByKey/distinct

Serialize will comes with two method:

V8.serialize()/V8.deserialize()

For case 1-3, data will be serialized and deserialized later. is the best function for this. It can deal many type of data:

  1. primitive types: undefined, null, number, boolean, string, Symbol

  2. array, object literal

  3. Date, RegExp, Set, Map

  4. Buffer/typed arrays

So in generally speaking you could use any type except custom classes.

Key Serialize

But there's a problem for V8.serialize: sometimes it gives different result for same input, especially for splitted string. So when we need another serialize function for combineByKey/distinct.

The function may change in future, but these types are guaranteed safe as key types:

  1. primitive types except Symbol: null, number, boolean, string

  2. array, object literal

Current version of dcf use a custom version of for key serialize, which support undefined value.

V8.serialize()
msgpack5