Skip to main content

Interpreting our AI detection scores


Winston AI is the world’s most powerful AI generated content detector. Our software is trained on huge amounts of data generated by the most widely used AI text generation tools, including GPT-4, GPT-3, ChatGPT, Jasper, Copy AI, Open Assistant and much more. In parallel, we’ve trained our software on Human Generated content. 

On top of this, Winston uses several pattern recognition algorithms to detect AI generated content. 

Based on these elements and our tests, our tool was able to predict with a 99.98% accuracy if a given text was generated by an AI writing tool such as ChatGPT, GPT4, Bard, Bing Chat, Jasper, Claude 2 and many more.

Once our tool processes content, it will return a probability (0-100%) that the text was generated by artificial intelligence. 

It is important to note that this is a probabilistic approach.

Our team is always keeping up with the new language generation models, so you can be assured our tool is and will remain up to date with the latest developments and innovations.

Methods Used by Winston: 

There are various methods that these tools use to identify whether a text is created by artificial intelligence (AI) or not. 

The two primary categories for detecting such texts are linguistic analysis and comparison with previously known AI-generated texts. 

  1. Linguistic analysis: Involves analyzing the text’s characteristics, such as semantic meaning and repetitiveness, which can indicate if it’s generated by AI. 
  2. Data Training: Similarly, if a text bears resemblance to a known AI-generated text, it can also suggest that it’s AI-generated.

Within the linguistic analysis, there are 2 important elements that are revealing if a given content is AI generated or human: Perplexity and Burstiness. 

Perplexity: Perplexity is a metric that gauges the efficacy of a probability distribution or a language model in forecasting a given sample. In the realm of detecting AI-generated content, perplexity can serve as a tool for assessing the proficiency of an AI language model and determining if a text is machine-generated or human-written.

If the text is AI-generated, the perplexity value will be lower as the model would have already encountered similar patterns in the data used for its training. Conversely, if the text is more intricate, it’s more probable that it was written by a human. Put simply, the lower the perplexity score, the higher the likelihood that the text was generated by AI.

Burstiness: AI-generated text has a distinctive style that differs from that of humans. Since these models are trained on a set of data, they tend to employ specific words and phrases more frequently than humans would. This pattern can be used to identify if a text was created by an AI.

When a text features a cluster of words and phrases that are repeated within a short span of time, there’s a good chance that it’s been generated by an AI. For instance, AI-generated text may exhibit a lack of variation or overuse of particular terms. This can be attributed to the model’s tendency to repeat the most commonly used words and phrases from its training data.

Therefore, by analyzing the text’s language style, it’s possible to discern if it’s AI-generated or written by a human.

In conclusion, although our assessment of the content isn’t enforceable, Winston remains the leading solution to detect AI content by always training its data on new LLM’s. We recommend writers, students and content creators continue to improve their “burstiness”, originality and perplexity when writing.

Human score

The Human Score is a metric used by Winston AI to estimate the likelihood that a given piece of content was generated by an AI tool versus being written by a human.

It’s important to note that a score of 80% human and 20% AI doesn’t mean that only 20% of the content was generated by AI; rather, it means that Winston has a 80% confidence level that the content was created by a human.

Our algorithm takes into account all the aforementioned elements: Data Training on all generative AI tools, Linguistic Analysis, Perplexity and Burstiness to process your text and return a highly accurate predictive analysis.

AI prediction map

Winston helps detect computer-generated text by analyzing its predictability and highlighting words based on how likely they are to appear. Sentences under 60 characters are ignored. We predict word rankings and color code them from most to least predictable. It’s important to note that our software will select text that an AI text generation tool would have written if prompted.

This prediction map works independently from our Human Score to help get additional information and assessments on your text. The prediction map may highlight human text, but this doesn’t necessarily mean it is AI generated, but rather that it is written in a way that any text generation tool would have. The prediction map should be used in combination with our Human Score to get additional clarity on an assessment.

Readability Score

The Flesch-Kincaid Readability Score is actually made up of two tests: the Flesch Reading Ease and the Flesch-Kincaid Grade Level. Both tests work together to evaluate the readability of a text, but they do so in slightly different ways.

1. Flesch Reading Ease: This test calculates a score based on the average sentence length and the average number of syllables per word in a text. The result is a number usually ranging from 0 to 100, with higher scores indicating that the text is easier to read. For example, a score of 90-100 suggests that the content is easily understood by an 11-year-old, while a score of 0-30 means it’s best suited for college graduates.

2. Flesch-Kincaid Grade Level: This test also takes into account the average sentence length and syllable count, but it translates the result into a U.S. school grade level. For instance, a score of 8.0 means that the text is suitable for someone in the 8th grade.

So, why is the Flesch-Kincaid Readability Score important? Knowing how accessible your writing is can help you tailor your content to your target audience. For example, if you’re writing a blog post for a general audience, you’ll want a higher Reading Ease score and a lower Grade Level score to ensure that it’s easy to understand. On the other hand, if you’re writing a technical paper for experts in your field, a lower Reading Ease score and a higher Grade Level score might be more appropriate.

In short, the Flesch-Kincaid Readability Score is a valuable tool for writers to gauge the accessibility of their content. By keeping readability in mind, you can create engaging and effective writing that reaches a wider audience and communicates your ideas more clearly. Happy writing!

Does Grammarly trigger AI detectors?

Grammarly’s legacy product functions as an advanced editing tool, refining texts through grammatical and stylistic corrections without generating new content, thereby typically not triggering AI content detectors like Winston AI. However, GrammarlyGO is an AI writing tool, responding to user prompts to create original compositions. This generative capability aligns closely with the markers identified by AI detection algorithms, potentially categorizing GrammarlyGO’s output as AI generated. For more information, we highly recommend reading our article on this matter: Do AI Detectors Detect Grammarly?