Benjamin Church Ball State University, Floyd Stoner Ball State University, Andy Vera Ball State University
Faculty Sponsor(s): Fred Kitchens Ball State University, Kyle Church Ball State UniversityOne of the most valuable assets an organization can possess is information. While information comes in many forms, organizations must often create their own information through processing large amounts of data. One way this can be achieved is through using a Hadoop Cluster. Hadoop is an open source software designed for performing data analytics on massively large amounts of data. A Hadoop Cluster is a cluster of computer servers running Hadoop software in order to process those large amounts of data efficiently. Cluster computers like a Hadoop Cluster involve connecting many individual machines so that the machines function logically as one unit. Hadoop Clusters are easily scalable, and they provide speed and resiliency in data processing. These clusters are utilized for data processing by some of the biggest technology companies on the planet including Facebook, Google, and IBM, yet they do not need state-of-the art-hardware to operate. Quite the contrary, Hadoop Clusters often function on commodity hardware using open source software and operating systems, making them inexpensive to build and maintain. The goal of this project is to produce a functioning Hadoop Cluster by connecting old workstation machines that would have otherwise served no purpose. The machines will be physically and logically connected; all required software and an Ubuntu operating system will be installed on all machines, and proper documentation will be created in order to maintain the cluster for future use.
Mathematics & Computer Science
When & Where
Irwin Library 3rd Floor