Tutorials

How to format your data for sentiment analysis on Metaranx

Properly formatting your data is incredibly important to achieving your artificial intelligence goals on Metaranx. Unfortunately, formatting your data is not necessarily intuitive. The models we rely on for you to fine-tine have a lot of demands and quirks in how the data has to be laid out in order to properly train the model. To train anything on sentiment analysis, you need to follow these exact steps below. If you upload data that does not follow this format, you should receive an error message.

For more detailed step-by-step instructions, read our help doc on formatting your data for sentiment analysis.

How to format your data for any sentiment analysis task on Metaranx

  1. Open your preferred spreadsheet in preparation for entering your data. You can use Excel, Google Sheets, anything that can export to a .CSV.
  2. Column 1 (or "A") needs to exist but remain blank.
  3. Column 2 (or "B) needs to hold all your text-based data. The header must be called "text" and commas in text will force a separation in the data.
  4. Column 3 (or "C") needs to hold your labels. The header must be called "labels" and the labels need to be numbered between 0 - 10.
  5. Aim to add at least 1,000 examples of each label. The more data you have and the more variety exists in your data, the better your AI will perform.

Explanation: Column 1 (or "A") needs to exist but remain blank:

This is a quirk of the sentiment analysis model you're using to build your AI application. The first column must be blank all the way down. We automatically split the data into "training" and "testing" - into 90% and 10%. We use the 10% to test the model and let you know how well it performed. The blank column essentially holds labels for "training" and "testing." From this, you get a precision score for your AI to know how accurate it is. You will find the precision score of your AI in the dashboard.

Explanation: Column 2 (or "B") needs to hold all your text-based data:

Enter your text based data into rows divided by the type of data you have. For example, if you are dividing your text into "positive statements" and "negative statements" then you need to divide the text into rows based on whether they are positive statements or negative statements.

Explanation: Column 3 (or "C") needs to hold your labels:

Your labels are important for telling the sentiment analysis AI what each of your text data rows are about. How does the AI know your statements are positive verses negative? By how you label them! You can have up to ten labels. In the positive versus negative statements, you would have two labels. Positive could be labeled as "0" and negative could be labeled as "1" - you must label you data by numbers.

Explanation: How much data you should have:

In the example of positive and negative statements, you want to prove the AI with as much data to train on as possible. If you are making this dataset yourself, aim to find at least 1,000 examples of each label - so 1,000 positive statements plus 1,000 negative statements for a total of 2,000. We recommend having at least 10,000 rows to have a more accurate artificial intelligence application.

Export your dataset for the sentiment analysis AI

Now you have a dataset ready to train your AI - the exciting part begins! Export your dataset as a .CSV. Upload the dataset to your Canvas under "Build an AI" -> language -> sentiment analysis. Now you can train your AI on this dataset to teach it what positive and negative statements are. After training, it should be able to tell you whether a statement is positive or negative from new content you present it. Congratulations- you built a sentiment analysis AI application!

Samantha Lloyd

Samantha is the co-founder and CEO of Metaranx.