Project details for Pattern

Logo Pattern 2.4

by tomdesmedt - August 31, 2012, 02:26:01 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view ( today), download ( today ), 0 subscriptions


"Pattern" bundles a diverse range of Python functionality for working with text. It contains tools for web data mining such as a uniform API for web services (Google, Yahoo, Bing, Twitter, Wikipedia, Flickr, Facebook, RSS), a HTML DOM parser, web crawler and a PDF parser. It has wrappers for SQLite and MySQL databases, and a Datasheet class for working with CSV-files. It has a transformation-based tagger/chunker for English and Dutch, sentiment lexicons, a WordNet interface, and an n-gram search algorithm. It also has algorithms for tf-idf, cosine similarity, LSA, k-means and hierarchical clustering, Naive Bayes, KNN and SVM classifiers. It has a helper module for writing HTML canvas graphics in the web browser (no plugins needed), and tools for directed graphs, graph centrality, graph partitioning and spring-based graph visualization.

The package is well-documented at:

It comes bundled with 30+ example scripts and 350+ unit tests.

Please let us know if you find any bugs!

Changes to previous version:
  • Small bug fixes in overall + performance improvements.
  • Module pattern.web: updated to the new Bing API (Bing API has is paid service now).
  • Module pattern.en: now includes Norvig's spell checking algorithm.
  • Module new German tagger/chunker, courtesy of Schneider & Volk (1998) who kindly agreed to release their work in Pattern under BSD.
  • Module the search syntax now includes { } syntax to define match groups.
  • Module pattern.vector: fast implementation of information gain for feature selection.
  • Module pattern.graph: now includes a toy semantic network of commonsense (see examples).
  • Module canvas.js: image pixel effects & editor now supports live editing
BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
Supported Operating Systems: Linux, Windows, Mac Os X
Data Formats: Txt, Csv
Tags: Graph, Svm, Latent Semantic Analysis, Natural Language Processing, Information Extraction, Data Visualization, Tfidf, Csv, K Nearest Neighbor, Html, Dutch, English, German
Archive: download here


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.