makes it easy to identify words
that discriminate categories in
textual data.

Get Started FAQ

How it works

Simply upload an Excel file, and let us do the rest.

Let's explain Wordify with an example. Imagine you are thinking about having a glass of wine with your friends; you know you like bold, woody wine, but are unsure which one to choose. You wonder whether there are some words that describe each type of wine; since you are a researcher, you decide to approach the problem scientifically.

Gather textual data

You go to your favorite platform and for each type of wine (label) you dowload some reviews (texts).

Wordify the data

You use our platform to wordify your data.

Receive your results

You receive by email the results of the analysis which will tell you the most indicative words, both negative and positive, for each type of wine.

Step 1. Prepare your data

Create an Excel file with two columns. Name your columns "text" and "label". Copy each of your texts into its own row in the first column, and add the respective label in the second column.

To have reliable results, we suggest providing at least 2000 labelled texts. If you provide less we will still wordify your file, but the results should then be taken with a grain of salt.

We currently do not support multi-language texts, therefore your texts should be in one language. FAQ to see those supported.

Step 2. Upload your file

Once you have prepared your Excel file, click the "Choose File" button. Browse for your file.

Choose the language of your texts. Check out the FAQ to see those supported. Provide your email. We will process your data and you will receive your wordified file by email. Depending on the number of requests, it can take up to 30 minutes (but usually 3-4 are enough). No data is stored on our server.

Wordify your data!

Please choose a valid email address.

Please choose a valid email address.


Should you have questions other than these below, please do not hesitate to contact us.

What is Wordify?

A way to find out which terms are most indicative for each of your dependent variable values.

What happens to my data?

Nothing. We never store the data you upload on disk: it is only kept in memory for the duration of the modeling, and then deleted. We do not retain any copies or traces of your data.

What input formats do you support?

The file you upload should be .xlsx, with two columns: the first should be labeled 'text' and contain all your documents (e.g., tweets, reviews, patents, etc.), one per line. The second column should be labeled 'label', and contain the dependent variable label associated with each text (e.g., rating, author gender, company, etc.).

How does it work?

It uses a variant of Stability Selection (Meinshausen and Bühlmann, 2010) to fit hundreds of logistic regression models on random subsets of the data, using different L1 penalties to drive as many of the term coefficients to 0. Any terms that receive a non-zero coefficient in at least 30% of all model runs can be seen as stable indicators.

How much data do I need?

We recommend at least 2000 instances, the more, the better. With fewer instances, the results are less replicable and reliable.

Is there a paper I can cite?

Yes please! Reference coming soon...

What languages are supported?

Currently we support: English, German, Dutch, Spanish, French, Portuguese, Italian, Greek.

Contact Us

Via Röntgen n. 1, Milan 20136 (ITALY)


+39 02 5836 2604