Hadoop is the de-facto standard for Big Data, that is, large scale data processing and storage. Hadoop is a relatively young platform that still suffer from fundamental scalability issues: all the file system metatdata must be stored in the memory heap of one unique node, the name node, and all the scheduling ecisions are taken by one unique node, the scheduler. At KTH/SICS, we are developing Hadoop Open Platform-as-a-Service (Hops), a new distribution of Apache Hadoop with scalable, highly available, customizable metadata and scheduling.
Hogyan osszuk meg R kódjainkat a többiekkel? Hogyan érdemes tesztelnünk R kódjainkat?
És mi az a Reproducible Research? Végül egy konkrét feladat megoldása R-ben.