About Accuracy & Methodology

How the engine works

The check runs in two phases. First, retrieval: we sample distinctive 8-12 word sequences from every region of your text - preferring rare words, which retrieve far better than common openers - and search each one on the live web as an exact quoted phrase. That produces a set of candidate pages.

Second, verification: we download each candidate page and align every sentence of your text against the page's actual content, locally. Word-for-word overlap is detected with rolling word-window comparison; close paraphrase with token-overlap and edit-similarity thresholds. A search engine ranking a page is never treated as evidence by itself - only verified text comparison counts, and each match carries a confidence figure in the report.

Before any of that, your text is normalized: Unicode lookalike characters (a standard trick for fooling checkers) are collapsed back to their plain equivalents, so swapped Cyrillic letters do not hide matches.

When we will miss matches (false negatives)

Paywalled and subscription sources: academic journals, news archives behind logins.
Private databases - including academic submission archives like Turnitin's. No public tool can see them; any free checker implying otherwise is lying to you.
Offline sources: printed books and papers that were never put on the web.
Very fresh content: pages published minutes or hours ago that search engines have not indexed yet.
Heavy paraphrase: rewriting that changes most words falls below the near-match threshold. Detecting ideas (rather than wording) is beyond any text matcher.

When we will flag innocent text (false positives)

Correctly quoted material - a quote is supposed to match its source. Check the citation, not the highlight.
Common phrases and boilerplate: stock expressions, legal formulae, methodological descriptions repeated across a whole field.
Bibliographies and reference lists, which naturally match the works they cite.
Coincidental wording in short, factual sentences.

This is why the report shows the matched source text beside yours, sentence by sentence, with exact and near matches marked separately: the tool finds overlap; a person judges what it means.

What the score means

The matched percentage is the share of your words that sit in sentences aligned to a retrieved source. The gauge bands - 0-5% looks original, 6-20% some matches, 21%+ significant matching - are review guidance calibrated for typical prose, not accusation thresholds. A 4% report can still contain one fully copied paragraph worth fixing; a 25% report of properly quoted material can be perfectly honest work.

We also tell you how many sources were actually examined for your check - the real number, printed on the report, because coverage claims you cannot verify are marketing, not accuracy.

Results are indicative, not conclusive. We compare your text against publicly accessible web pages at the moment you run the check. We cannot detect matches in sources that are offline, paywalled, unindexed, or held in private databases (including academic submission archives). Common phrases, correctly quoted material, and coincidental wording can appear as matches. Use this report as a guide for review and citation - not as standalone proof that text was or was not plagiarized.

Frequently asked questions

Two phases. First we sample distinctive word sequences from your text and search them on the live web as exact phrases. Then we download every candidate page and align every sentence of your text against the page’s real content locally - exact matches by rolling word-window comparison, near matches by token overlap and edit similarity. Search results alone are never trusted as matches; only verified text comparison counts.

When the source is not on the public web at check time: paywalled journals, offline books, private submission archives, pages published minutes ago that search engines have not indexed, or content behind logins. Heavily paraphrased text can also fall below the near-match threshold. No web-based checker escapes these limits; we would rather tell you than let a 0% mislead you.

Correctly quoted passages, common stock phrases, technical boilerplate, legal or religious formulae, and bibliographies all legitimately match their sources. The report separates exact from near matches and shows the source text beside yours so a human can make the call - the score is an instrument reading, not a judgment.

It is the share of your words that sit in sentences we matched to a retrieved source: matched-sentence words divided by total words. The gauge bands are 0-5% "looks original", 6-20% "some matches - review citations", 21%+ "significant matching". The bands are review guidance, not accusation thresholds.

The interface runs in 100+ languages, and the engine itself is multilingual: sentence segmentation handles Latin, CJK and Arabic punctuation, and retrieval uses engines with strong non-English coverage. Match quality is best in languages with a large public web footprint; for very small languages, coverage is honestly thinner.

What this checker can - and cannot - detect

How the engine works

When we will miss matches (false negatives)

When we will flag innocent text (false positives)

What the score means

Frequently asked questions

How does the checking engine actually work?

When will this checker miss plagiarism (false negatives)?

When will it flag text that is not plagiarism (false positives)?

What does the matched percentage mean, precisely?

Which languages does the checker support?