Ferro's Gitbook
  • README
  • DevOps
    • Grafana_Cloud
  • OpenWrt
    • DHCP_DNS
    • GLiNet
    • boot
    • captive_portal
    • luci
    • mwan3
    • radius
    • theme
    • wireless
  • apps
    • web
  • BSD
    • Mac
  • Cloud
    • aws
    • azure
    • cf
    • gcp
    • github
    • ibm_bluemix
    • Pricing
  • container
    • docker
    • Kubernetes
    • podman
  • db
    • InfluxDB
    • loki
    • MySQL & MariaDB
    • Oracle
    • PostgreSQL
  • dev
    • AHK
    • BI
    • LBS
    • ML
    • android
    • editor
    • flutter_web
    • git
    • go
    • HTML5/BS
    • j2ee
    • js
    • js_grid
    • js_vue
    • jupyter
    • ocaml
    • powershell
    • py
    • py_GUI
    • Django
    • shell
    • snippets
    • uni
    • vba
    • wechat.zh
    • wechat_mp.zh
  • elec
    • 3D Printing
    • AC
    • MOSFET
    • battery
    • boost
    • bulk
    • metal
    • simulator
  • hw
    • GPU
    • PCI
    • arduino
    • Bluetooth
    • ent
    • Pinout
    • x86_AMD
    • x86_intel
  • linux
    • Test System
    • X
    • arch
    • fs
    • kernel
    • Memory
    • nw
    • Linux Services
    • Systemd
    • text
  • ms
    • vscode
    • windows
    • wsl
  • multimedia
    • Blender
    • audio
    • blender
    • graphics
    • home
  • nw
    • L3
    • L3_IPv6
    • SDN
    • VPN
    • dns
    • hw
    • Low Level
    • mikrotik
    • mwan
    • Openflow
    • OVS
    • pfsense
    • ppp
    • proxy
    • tsocks
    • pxe
    • Security
    • TCP
  • phone
    • Mi
    • android
  • Storage(SW)
  • vt
    • Intel GVT-g
    • PVE
    • QEMU
    • VDI
    • hyper-v
    • kube
    • libvirt
    • OpenStack
  • Web
    • IBM_MQ
    • IBM_Websphere
    • SSL
    • Apache/IBM_IHS
    • blockchain
    • caddy
    • j2ee
    • nginx
    • static_site
Powered by GitBook
On this page
  • Speech recognition
  • ML - OpenAI
  • Azure
  • Text To Speech
  • Microsoft
  • Bing
  • Ali
  • Google
  • AWS
  • IBM
  • Xunfei
  • Tencent
  • Baidu
  • Offline
  • Software
  • Music
  • Apps
  • Basic

Was this helpful?

Edit on Git
  1. multimedia

audio

PreviousBlenderNextblender

Last updated 1 year ago

Was this helpful?

Speech recognition

https://github.com/Uberi/speech_recognition

ML - OpenAI

https://github.com/openai/whisper

Azure

https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/

free per month: 5 audio hours / 0.5 million characters /  10,000 transactions

Text To Speech

My rate: Microsoft Neural > Google > Microsoft Zira Desktop ~ AWS Justin >> IBM/Baidu >> espeak

Microsoft

Demo: https://speech.microsoft.com/customvoice download: https://greasyfork.org/en/scripts/444347-azure-speech-download

git clone https://github.com/Azure-Samples/cognitive-services-speech-sdk.git
=> quickstart/python/text-to-speech/quickstart.ipynb

https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/

STT - 5 audio hours free per month
TTS Standard - 5M characters free per month
TTS Neural   - 0.5M characters free per month

Portal: https://portal.azure.com/#blade/HubsExtension/BrowseResource/resourceType/Microsoft.CognitiveServices%2Faccounts

SDK/REST support: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/overview

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech

Python SDK: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/python/text-to-speech/quickstart.ipynb Console: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/python/console Flask app: https://github.com/MicrosoftTranslator/Text-Translation-API-V3-Flask-App-Tutorial

Bing

https://azure.microsoft.com/en-us/services/cognitive-services/speech/ https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-api/

TIER
FEATURES
UNIT
PRICE

Bing Speech API—free

Transactions

5,000 transactions free per month

Bing Speech-to-Text API

Utterances up to 15 seconds long

Transactions

$4 per 1,000 transactions

Bing Text-to-Speech API

Transactions

$4 per 1,000 transactions

https://github.com/westparkcom/Python-Bing-TTS

Ali

Demo: https://ai.aliyun.com/nls/tts Non-free: https://help.aliyun.com/document_detail/84881.html#h1-u8BA1u8D39u65B9u5F0Fu548Cu62A5u4EF73

Google

https://play.google.com/store/apps/details?id=com.google.android.tts

https://github.com/pndurette/gTTS

gtts-cli -f hello.txt -l 'cs' -o hello.mp3

from gtts import gTTS
tts = gTTS(text='Hello', lang='en', slow=True)
tts.save("hello.mp3")

AWS

https://console.aws.amazon.com/polly https://aws.amazon.com/polly/pricing/

free: 5 million characters/m/first 12m $4.00 per 1 million characters

IBM

https://text-to-speech-demo.ng.bluemix.net https://console.bluemix.net/catalog/services/text-to-speech

Free: 10,000 characters per month. deleted after 30 days of inactivity. Paid: $0.02 USD/THOUSAND CHAR

https://github.com/watson-developer-cloud/python-sdk

Xunfei

http://www.xfyun.cn/services/offline_tts (99+ RMB/m)

Tencent

https://cloud.tencent.com/product/aai (Price unknown)

Baidu

https://developer.baidu.com/vcast

http://yuyin.baidu.com/#try http://yuyin.baidu.com/docs/tts/136

Offline

https://en.wikipedia.org/wiki/Comparison_of_speech_synthesizers

https://pyttsx3.readthedocs.io/en/latest/ (no file saving?)

SAPI5 on Windows XP and Windows Vista and Windows 8,8.1 , 10
NSSpeechSynthesizer on Mac OS X 10.5 (Leopard) and 10.6 (Snow Leopard)
espeak on Ubuntu Desktop Edition 8.10 (Intrepid), 9.04 (Jaunty), and 9.10 (Karmic)

http://espeak.sourceforge.net/commands.html

espeak "unnatural man voice"  -w out.wav  # apt install -y libespeak1
setup_espeak.exe #  SAPI5 version

Swift: https://youtu.be/bl6hEBXuv5Y?t=1057

Software

https://en.wikipedia.org/wiki/Comparison_of_digital_audio_editors https://en.wikipedia.org/wiki/Comparison_of_free_software_for_audio

Audacity: https://github.com/audacity/audacity/releases

Input/Output equalizer: https://sourceforge.net/projects/equalizerapo/

AI Real-Time Voice Cloning: https://github.com/CorentinJ/Real-Time-Voice-Cloning

Music

Apps

  • Coding: https://sonic-pi.net/#examples

  • Singing Voice Conversion: https://github.com/svc-develop-team/so-vits-svc

Basic

♭ flat:  lower in pitch by one semitone (half step)
♯ sharp: higher in pitch by one semitone (half step)

Equal temperament / 十二平均律: https://baike.baidu.com/item/%E5%8D%81%E4%BA%8C%E5%B9%B3%E5%9D%87%E5%BE%8B/592297

Speech recognition
ML - OpenAI
Azure
Text To Speech
Microsoft
Bing
Ali
Google
AWS
IBM
Xunfei
Tencent
Baidu
Offline
Software
Music
Apps
Basic