Context
emptyRDD(): RDD<never>
Create an RDD that has no partitions or elements.
range(from: number, to?: number, step?: number = 1, numPartitions?: number): RDD<number>
Create a new RDD of int containing elements from start to end (exclusive), increased by step every element. If called with a single argument, the argument is interpreted as end, and start is set to 0.
Parameters:
start – the start value
end – the end value(exclusive)
step – the incremental step (default: 1)
numPartitions – the number of partitions of the new RDD
Returns:
An RDD of numbers
Difference from Spark:
Parameter step
can be a double number, but this may cause precision loss and leads to some unexpected results:
parallelize<T>(c: T[], numPartitions?: number): RDD<T>
Distribute a local javascript collection to form an RDD.
union<T>(...rdds: RDD<T>[]): RDD<T>
Build the union of a list of RDDs.
binaryFiles(baseUrl: string, options?: { recursive?: boolean }): RDD<[string, Buffer]>
Read a directory of binary files from any file system (must be available on all nodes). Each file is read as a single record and returned in a key-value pair, where the key is the path of each file(relative from baseUrl
), the value is the content of each file.
Difference from Spark:
Each file will become a single partition. If you need, you can call
repartition()
after file was loaded.DCF support recursive loads, which will load all files in directory and sub-directory.
wholeTextFiles(baseUrl: string, options?: Options): RDD<[string, string]>
Read a directory of text files from any file system (must be available on all nodes). Each file is read as a single record and returned in a key-value pair, where the key is the path of each file, the value is the content of each file.
Options
encoding?:string – encoding of text file(s) (default: 'utf8')
recursive?:boolean – include file(s) in sub-directory(s)
decompressor?: (data: Buffer, filename: string)=>Buffer
provide a decompressor function
functionEnv?: FunctionEnv – function serialize context(upvalues)
Difference from Spark:
Each file will become a single partition. If you need, you can call
repartition()
after file was loaded.DCF support recursive loads, which will load all files in directory and sub-directory.
Decompressor is provided as a function instead of a configure.
textFile(baseUrl: string, options?: Options): RDD<string>
Read a single text file or a directory of text files from any file system (must be available on all nodes). And return a RDD with each line of them.
Options
encoding?:string – encoding of text file(s) (default: 'utf8')
recursive?:boolean – include file(s) in sub-directory(s)
decompressor?: (data: Buffer, filename: string)=>Buffer
provide a decompressor function
functionEnv?: FunctionEnv – function serialize context(upvalues)
__dangerousDontCopy?: boolean - Do not copy each line, keep reference of the huge string.
Last updated