# audio

* [Speech recognition](#speech-recognition)
  * [ML - OpenAI](#ml---openai)
  * [Azure](#azure)
* [Text To Speech](#text-to-speech)
  * [Microsoft](#microsoft)
  * [Bing](#bing)
  * [Ali](#ali)
  * [Google](#google)
  * [AWS](#aws)
  * [IBM](#ibm)
  * [Xunfei](#xunfei)
  * [Tencent](#tencent)
  * [Baidu](#baidu)
  * [Offline](#offline)
* [Software](#software)
* [Music](#music)
  * [Apps](#apps)
  * [Basic](#basic)

## Speech recognition

<https://github.com/Uberi/speech\\_recognition>

### ML - OpenAI

<https://github.com/openai/whisper>

### Azure

<https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/>

```
free per month: 5 audio hours / 0.5 million characters /  10,000 transactions
```

## Text To Speech

My rate: Microsoft Neural > Google > Microsoft Zira Desktop \~ AWS Justin >> IBM/Baidu >> espeak

### Microsoft

Demo: <https://speech.microsoft.com/customvoice\\>
download: <https://greasyfork.org/en/scripts/444347-azure-speech-download>

```
git clone https://github.com/Azure-Samples/cognitive-services-speech-sdk.git
=> quickstart/python/text-to-speech/quickstart.ipynb
```

<https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/>

```
STT - 5 audio hours free per month
TTS Standard - 5M characters free per month
TTS Neural   - 0.5M characters free per month
```

Portal: <https://portal.azure.com/#blade/HubsExtension/BrowseResource/resourceType/Microsoft.CognitiveServices%2Faccounts>

SDK/REST support: <https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/overview>

<https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech>

Python SDK: <https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/python/text-to-speech/quickstart.ipynb\\>
Console: <https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/python/console\\>
Flask app: <https://github.com/MicrosoftTranslator/Text-Translation-API-V3-Flask-App-Tutorial>

### Bing

<https://azure.microsoft.com/en-us/services/cognitive-services/speech/\\>
<https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-api/>

| TIER                    | FEATURES                         | UNIT                              | PRICE                     |
| ----------------------- | -------------------------------- | --------------------------------- | ------------------------- |
| Bing Speech API—free    | Transactions                     | 5,000 transactions free per month |                           |
| Bing Speech-to-Text API | Utterances up to 15 seconds long | Transactions                      | $4 per 1,000 transactions |
| Bing Text-to-Speech API | Transactions                     | $4 per 1,000 transactions         |                           |

<https://github.com/westparkcom/Python-Bing-TTS>

### Ali

Demo: <https://ai.aliyun.com/nls/tts\\>
Non-free: <https://help.aliyun.com/document\\_detail/84881.html#h1-u8BA1u8D39u65B9u5F0Fu548Cu62A5u4EF73>

### Google

<https://play.google.com/store/apps/details?id=com.google.android.tts>

<https://github.com/pndurette/gTTS>

```
gtts-cli -f hello.txt -l 'cs' -o hello.mp3

from gtts import gTTS
tts = gTTS(text='Hello', lang='en', slow=True)
tts.save("hello.mp3")
```

### AWS

<https://console.aws.amazon.com/polly\\>
<https://aws.amazon.com/polly/pricing/>

free: 5 million characters/m/first 12m\
$4.00 per 1 million characters

### IBM

<https://text-to-speech-demo.ng.bluemix.net\\>
<https://console.bluemix.net/catalog/services/text-to-speech>

Free: 10,000 characters per month. deleted after 30 days of inactivity.\
Paid: $0.02 USD/THOUSAND CHAR

<https://github.com/watson-developer-cloud/python-sdk>

### Xunfei

<http://www.xfyun.cn/services/offline\\_tts> (99+ RMB/m)

### Tencent

<https://cloud.tencent.com/product/aai> (Price unknown)

### Baidu

<https://developer.baidu.com/vcast>

<http://yuyin.baidu.com/#try\\>
<http://yuyin.baidu.com/docs/tts/136>

### Offline

<https://en.wikipedia.org/wiki/Comparison\\_of\\_speech\\_synthesizers>

<https://pyttsx3.readthedocs.io/en/latest/> (no file saving?)

```
SAPI5 on Windows XP and Windows Vista and Windows 8,8.1 , 10
NSSpeechSynthesizer on Mac OS X 10.5 (Leopard) and 10.6 (Snow Leopard)
espeak on Ubuntu Desktop Edition 8.10 (Intrepid), 9.04 (Jaunty), and 9.10 (Karmic)
```

<http://espeak.sourceforge.net/commands.html>

```
espeak "unnatural man voice"  -w out.wav  # apt install -y libespeak1
setup_espeak.exe #  SAPI5 version
```

Swift: <https://youtu.be/bl6hEBXuv5Y?t=1057>

## Software

<https://en.wikipedia.org/wiki/Comparison\\_of\\_digital\\_audio\\_editors\\>
<https://en.wikipedia.org/wiki/Comparison\\_of\\_free\\_software\\_for\\_audio>

Audacity: <https://github.com/audacity/audacity/releases>

Input/Output equalizer: <https://sourceforge.net/projects/equalizerapo/>

AI Real-Time Voice Cloning: <https://github.com/CorentinJ/Real-Time-Voice-Cloning>

## Music

### Apps

* Coding: <https://sonic-pi.net/#examples>
* Singing Voice Conversion: <https://github.com/svc-develop-team/so-vits-svc>

### Basic

```
♭ flat:  lower in pitch by one semitone (half step)
♯ sharp: higher in pitch by one semitone (half step)
```

Equal temperament / 十二平均律: <https://baike.baidu.com/item/%E5%8D%81%E4%BA%8C%E5%B9%B3%E5%9D%87%E5%BE%8B/592297\\>
![](https://bkimg.cdn.bcebos.com/pic/0b46f21fbe096b6300e5207b09338744ebf8ac19)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://book.ferro.pro/multimedia/audio.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
