Comprehensive set of tools for machine learning and data mining to enhance your insights through predictive analytics.
Mining your own data and turning what you know about your users, your clients, and your business into useful information it’s now an easy task. With Weka, an open source software, you can discover patterns in large data sets and extract all the information. It also brings great portability, since it was fully implemented in the JAVA programming language, plus supporting several standard data mining tasks.
Can I use Weka in commercial applications?
How do I generate compatible training and test sets that get processed with a filter?
Running a filter twice (once with the training set as input and then the second time with the test set) will create almost certainly two incompatible files. Why is that? Every time you run a filter, it will get initialized based on the input data and, of course, training and test sets will differ, thus creating incompatible output. You can avoid this by using batch filtering. See the article on Batch filtering for more details.
How do I perform attribute selection?
Weka offers different approaches for performing attribute selection: directly with the attribute selection classes, with a meta-classifier, or with a filter.
Check out the Performing attribute selection article for more details and examples.
How do I perform clustering?
Weka offers clustering capabilities not only as standalone schemes, but also as filters and classifiers. Check out the article about Using cluster algorithms for detailed information.
How do I perform text classification?
The article Text categorization with WEKA explains a few basics on how to deal with text documents, like importing and pre-processing.
How can I perform multi-instance learning in Weka?
The article Multi-instance classification explains which classifiers can perform multi-instance classification and which format the data must have for these multi-instance classifiers.
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a data set or called from your own JAVA code. It is also well suited for developing new machine learning schemes.
Weka's main user interface is the Explorer, featuring several panels which provide access to the main components of the workbench: the Preprocess Panel, the Classify Panel, the Associate Panel, the Cluster Panel, the Select Attributes Panel, and the Visualize Panel.
Weka packages are bundles of additional functionality, separate from the capabilities supplied in the core system. A package consists of some jar files, documentation, metadata, and possibly source code.
This allows users to select and install only what they need or are interested in, and also provides a simple mechanism for people to use when contributing to Weka. Some of the existing packages are provided by the Weka team, while others come from third parties.
Weka includes a facility for the management of packages and a mechanism to load them dynamically at runtime – there are both a command-line and a GUI package manager. More information on how to use the Weka Package Manager is provided here, as well as a list of WEKA Packages here.
The open Source delivers better, faster and reliable products, empowered by an active and wider community. Developers, testers, writers, implementers, and most of all users can make valuable contributions.
Here’s your guide to submitting just about any contribution to the Pentaho project. If you don’t find the answers you need here, please post your question to our Forums.