There is also a reader for text delimited files that also creates a bi-directional dictionary for rows and columns while creating the “ unique ordinal IntWriteable” keys Dmitriy mentions. The class is IndexedDataset. The Spark version of the companion Object has a constructor that takes a PairRDD[(String, String)] of elements and traits are provided that read text delimited files. Thinking about one that takes JSON or DataFrames but in any case these are easy to construct. An indexedDataset just wraps the DRM used in the linear algebra.
On Oct 6, 2015, at 9:31 PM, Dmitriy Lyubimov <***@gmail.com> wrote:
DRM format is compatible on persistence level with Mahout MapReduce
algorithms.
It is a Hadoop sequence file. The key is unique, can be one of
-- unique ordinal IntWriteable, treated as a row number (i.e. nrow=max(int
key)), or
-- Text, LongWritable, BytesWritable, or .. forget what else. This
technically do not have to be unique, but they usually are. The number of
operations available to matrices with "unnumbered" rows is therefore
somewhat reduced. For example, expressions that imply a transposition as a
final result, are not possible, because it is impossible to map non-int
rows to int ordinal indices of the columns.
The value of the DRM sequence file is always Mahout's VectorWritable. It is
allowed to mix sparse and dense vector payloads.
Post by go canalthank you Pat. I was having that issue when I was trying to do something
like that.
Just curious, how should I prepare the data so that it can
satisfy drmDfsRead (path) ? DRM format and how to create the DRM file
? thanks, canal
On Wednesday, October 7, 2015 4:09 AM, Pat Ferrel <
Linear algebra stuff is what Mahout Samsara is all about. For these docs
in-core means in-memory and out of core means distributed
http://mahout.apache.org/users/environment/out-of-core-reference.html
Thank you very much for the help. I will try Spark 1.4.
I would like to try distributed matrix multiplication. not sure if there
are sample codes available. I am very new to this stack. thanks, canal
Mahout 0.11.0 is built on Spark 1.4 and so 1.5.1 is a bit unknown. I think
the Mahout Shell does not run on 1.5.1.
That may not be the error below, which is caused when Mahout tries to
create a set of jars to use in the Spark executors. The code runs `mahout
-spark classpath` to get these. So something is missing in your env in
Eclipse. Does `mahout -spark classpath` run in a shell, if so check to see
if you env matches in Eclipse.
Also what are you trying to do? I have some example Spark Context creation
code if you are using Mahout as a Library.
Exception in thread "main" java.lang.IllegalArgumentException: Unable to
read output from "mahout -spark classpath". Is SPARK_HOME defined?
I have SPARK_HOME defined in Eclipse as an environment variable with value
of /usr/local/spark-1.5.1.
What else I need to include/set ?
thanks, canal