Help us improve this website. Send your feedback to community@pentaho.com.

Data Mining - Weka

Comprehensive set of tools for machine learning and data mining to enhance your insights through predictive analytics.

Downloads

Downloads

Explore and understand your data

Mining your own data and turning what you know about your users, your clients, and your business into useful information it’s now an easy task. With Weka, an open source software, you can discover patterns in large data sets and extract all the information. It also brings great portability, since it was fully implemented in the JAVA programming language, plus supporting several standard data mining tasks.


Frequently asked questions

Can I use Weka in commercial applications?

Since Weka is licensed under the GNU General Public License (GPL 2.0 for Weka 3.6 and GPL 3.0 for Weka > 3.7.5), any derivative work must be licensed under the GPL as well.

How do I generate compatible training and test sets that get processed with a filter?

Running a filter twice (once with the training set as input and then the second time with the test set) will create almost certainly two incompatible files. Why is that? Every time you run a filter, it will get initialized based on the input data and, of course, training and test sets will differ, thus creating incompatible output. You can avoid this by using batch filtering. See the article on Batch filtering for more details.

How do I perform attribute selection?

Weka offers different approaches for performing attribute selection: directly with the attribute selection classes, with a meta-classifier, or with a filter.
Check out the Performing attribute selection article for more details and examples.

How do I perform clustering?

Weka offers clustering capabilities not only as standalone schemes, but also as filters and classifiers. Check out the article about Using cluster algorithms for detailed information.

How do I perform text classification?

The article Text categorization with WEKA explains a few basics on how to deal with text documents, like importing and pre-processing.

How can I perform multi-instance learning in Weka?

The article Multi-instance classification explains which classifiers can perform multi-instance classification and which format the data must have for these multi-instance classifiers.

Weka Explorer

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a data set or called from your own JAVA code. It is also well suited for developing new machine learning schemes.

Weka's main user interface is the Explorer, featuring several panels which provide access to the main components of the workbench: the Preprocess Panel, the Classify Panel, the Associate Panel, the Cluster Panel, the Select Attributes Panel, and the Visualize Panel.

Preprocess Panel
The Preprocess Panel has facilities for importing data from a database, a CSV file, or other data file types, and for preprocessing this data using a so-called filtering algorithm. These filters can be used to transform the data (e.g., turning numeric attributes into discrete ones) and make it possible to delete instances and attributes according to specific criteria.
Classify Panel
The Classify Panel enables the user to apply classification and regression algorithms (indiscriminately called classifiers in Weka) allowing you to the resulting data set, to estimate the accuracy of the resulting predictive model, and to visualize erroneous predictions, ROC curves, etc., or the model itself (if the model is amenable to visualization like, e.g., a decision tree).
Associate Panel
The Associate Panel provides access to association rule learners that attempt to identify all important interrelationships between attributes in the data.
Cluster Panel
The Cluster Panel gives access to the clustering techniques in Weka, e.g., the simple k-means algorithm. There is also an implementation of the expectation maximization algorithm for learning a mixture of normal distributions.
Select Attributes Panel
The Select Attributes Panel provides algorithms for identifying the most predictive attributes in a data set.
Visualize Panel
The Visualize Panel shows a scatter plot matrix, where individual scatter plots can be selected, enlarged and analyzed using various selection operators.
Data Integration Plugins
Weka Scoring Plugin
The Weka Scoring Plugin is a tool that allows classification and clustering models created with Weka to be used to "score" new data as part of a Kettle transform. "Scoring" simply means attaching a prediction to an incoming row of data. The Weka scoring plugin can handle all types of classifiers and clusterers that can be constructed in Weka.
Documentation on this plugin can be found here.
ARFF Output Plugin
The ARFF Output Plugin is a tool that allows you to output data from Kettle to a file in WEKA's Attribute Relation File Format (ARFF). ARFF format is essentially the same as comma separated values (CSV) format, except with the addition of meta data on the attributes (fields) in the form of a header.
Documentation on this plugin can be found here.
Package Manager

Weka packages are bundles of additional functionality, separate from the capabilities supplied in the core system. A package consists of some jar files, documentation, metadata, and possibly source code.

This allows users to select and install only what they need or are interested in, and also provides a simple mechanism for people to use when contributing to Weka. Some of the existing packages are provided by the Weka team, while others come from third parties.

Weka includes a facility for the management of packages and a mechanism to load them dynamically at runtime – there are both a command-line and a GUI package manager. More information on how to use the Weka Package Manager is provided here, as well as a list of WEKA Packages here.

How to contribute

The open Source delivers better, faster and reliable products, empowered by an active and wider community. Developers, testers, writers, implementers, and most of all users can make valuable contributions.

Here’s your guide to submitting just about any contribution to the Pentaho project. If you don’t find the answers you need here, please post your question to our Forums.

Places to contribute
There are two primary ways to make sure that your contributions are recognized and reviewed in a timely fashion: through our Discussion Forums and through our issue and bug tracking system, JIRA.
All bug reports are recorded and tracked through our JIRA issue and bug tracking system.
We rely on time and code contributions from our community (and we'll never turn down money or beer) to keep our commitment of delivering a quality Business Intelligence platform in the open source scenario.
Both bug fixes, new features and improvements are types inputs in our JIRA system, allowing you to choose the type appropriated to your case.
Solutions should be submitted through the new feature type case in our JIRA system. Complete the case as described for other code contributions, then attach your solution as an additional file to the case.
Whether you're a developer who's implementing the platform or a business analyst who needs to solve a particular problem while using it, your suggestions are valuable! We encourage any community member to share its needs with us, as long as they're related to Business Intelligence software and the problems addressed by BI.
If you'd like to contribute documentation improvements, or submit a technical article, you can do so in the Pentaho Documentation Wiki.
The place to start with a language specific contribution is to look under our International Forums for the language you're interested in. Internationalization efforts are coordinated within these forums among the community members that are most experienced with each language.
The Pentaho team is building an automated platform test suite and submission protocol because, well, frankly, it's a damn good idea. Until that suite is completed, if you're willing to contribute with time and resources for quality assurance testing, please contact us. We will send you a matrix spreadsheet where you'll identify the environment, test variables in play and submit your results back to us.
If you want to get involved, have the time or resources to commit, and are not sure where to dig in, send us an email with a description of the resources you have and the commitment you can make, and we'll be happy to hook you up with a Pentaho team member to coordinate the best use of your time.
Any good community member knows a decent meal (or even a handful of corn chips) goes a long way toward increasing developer productivity. Well, it at least improves their outlook anyway?

If you would like to submit a contribution of beer, snacks, soda, beer, chocolate, pizza or pretty much anything else edible (like beer), send your package to:

Pentaho Corporation
5950 Hazeltine National Drive, Suite #340
Orlando, FL 32822
While we do not condone frivolous gifts or taking bribes, we are not above such gestures either. If there is a certain body of work that, when completed, makes your life easier, and you have a small yacht surprisingly available for a weekend, we can definitely talk. Contact us with your offer, and we can surely find SOMEONE, a Pentaho team member or a community member, that can help you out!

Downloads

Weka 3.7.10 Stable

See Release Notes for more info

Windows jre x64

Download

Windows x64

Download

Windows jre

Download

Windows

Download

Mac OS X

Download

If you're looking for a different version, you can find it here.