Carlos Guerreiro
perceptive
constructs
bespoke intelligent systems
Espoo, Finland
robust and scalable data pipelines
machine learning
mathematical optimization
ingest
model
build
scale
bits & pieces
ingest
pandas
(Python) for exploratory data preparation, manipulation and analysis
node.js
,
Tornado
and
netty
/
akka
for real time data exchange with browsers (with
socket.io
) or mobile devices, and to consume or implement data APIs
lxml
(Python) and
pjscrap
/
PhantomJS
(Javascript) for scraping HTML
model
scikit-learn
and
MDP
for general purpose machine learning in Python
statsmodels
(in Python), or
R
for statistical modelling
pymc
for bayesian inference
linear and mixed integer programming with
PuLP
,
GLPK
and
Gurobi
, constraint programming with
Gecode
.
build
number crunching
Eigen
in C++
SciPy
,
NumPy
and
Theano
in Python
Octave
(Matlab)
breeze (ScalaNLP/Scalala)
in Scala
interactive / real time data visualization
d3.js
,
NVD3.js
and
Cubism.js
raphael.js
(for IE/VML support)
databases
PostgreSQL
and
MySQL
for general use
Redis
for high velocity data in transit, smart caching, custom real time analytics
HBase
,
Cassandra
and
Riak
to scale out large write-heavy random access workloads
LevelDB
as embedded key/value storage for custom service components
MongoDB
for flexibily with semi-structured document-like data
scale
frameworks
Mahout
and
Pig
on
Hadoop
for (mostly) batch distributed processing
Spark
for iterative algorithms and interactive data mining on
Apache Mesos
clusters, possibly sharing data and resources with
Hadoop
Storm
for distributed real time computation
Vowpal Wabbit
for online machine learning of linear models, perhaps on a
Hadoop
cluster
Custom setups with
node.js
for I/O bound processing and coordination
Custom setups with
akka
clouds
Amazon Web Services
Cloud Foundry
Google App Engine
Linode
bits & pieces
irf
is a C++ implementation (with
node.js
and Python bindings) of Incremental Random Forests
code
npm
rawhash
is an experimental binary friendly alternative to using a hash as a key:value cache, for
node.js
code
npm
rdb-parser
is a
node.js
async streaming parser for
redis
RDB dumps, in 100% Javascript
code
npm
redis-sync
is a
node.js
redis
replication
slave toolkit
code
npm
recurrent
is a
redis
-backed manager of recurrent jobs, for
node.js
code
npm