Overall Quality MOS Scale
The Overall Quality MOS estimated via NISQA-v2 (Non-Intrusive Speech Quality Assessment, version 2) expresses the listener's perceived overall quality of a speech signal transmitted through a communication channel or processing pipeline. It follows the ITU-T P.800 Absolute Category Rating (ACR) 1–5 scale used in classical telephony and VoIP quality evaluations.
| Score | Descriptor | |
|---|---|---|
| 5 – Excellent | Imperceptible degradation; speech is perfectly clear and natural. | |
| 4 – Good | Slight, non-annoying impairments; easily understandable. | |
| 3 – Fair | Noticeable but acceptable distortions; intelligibility preserved. | |
| 2 – Poor | Annoying distortions; intelligibility partially reduced. | |
| 1 – Bad | Very annoying distortion; speech hardly intelligible. |
Practical Range and Expected Scores
Although the theoretical limits are 1 → “Bad” and 5 → “Excellent,” real subjective tests and model predictions rarely use the full range:
- Clean human speech recordings under ideal conditions typically yield ≈ 4.3 – 4.6 MOS.
- High-quality telephony or codec pipelines (e.g., AMR-WB, Opus wideband) usually score ≈ 3.8 – 4.4.
- Moderately degraded or noisy speech lies around 2.5 – 3.5.
- Heavily distorted, packet-loss or low-bitrate conditions often fall below 2.0.
The NISQA v2 model was trained on large crowdsourced English speech-quality datasets covering these conditions. Consequently, its predicted “Overall Quality MOS” values follow the same empirical distribution: even pristine signals seldom exceed 4.6 – 4.7, while most practical system outputs cluster between 3.0 and 4.2.
Why the Effective Ceiling Is Below 5.0
This saturation below 5.0 arises primarily from human rating behavior rather than any model limitation. Listeners in ACR tests display a well-known central-tendency bias—a reluctance to choose the extreme categories of a Likert-type scale. As a result, ratings concentrate around the mid-upper range, and even “perfect” reference material receives an average below the nominal maximum. NISQA-v2, trained to reproduce human judgments, therefore inherits this calibration naturally.
Summary
The Overall Quality MOS predicted by NISQA v2 quantifies the subjective transmission quality of speech on the ITU-T P.800 1–5 scale. In practice, due to the human tendency to avoid extremes and the statistical properties of real datasets, the usable range is approximately 1.0 – 4.7. Scores near 4.5 correspond to clean, transparent audio, 3–4 indicate acceptable communication quality, and values below 3 reflect perceptibly degraded or unpleasant speech.

Experience
RealTime Pro
- video calls
- voice chats
- voice calls
- video calls