--- created: tags: - "#daily-notes" - "#conda" - "#tts" author: - Shen Wei --- ## Summary: Today's main attempt was to successfully install F5-TTS, a local version of a speech-to-text tool. https://github.com/SWivid/F5-TTS At present, I know that this tool was developed by several students from Jiaotong University. I tried to install it. There are several technical points that need to be mentioned here. The first is about the installation of [Conda](https://www.anaconda.com/). Conda is a toolkit that can help create various independent environments. Whether you want to build data science/machine learning models, deploy your work to production, or securely manage a team of engineers, Anaconda provides the tools necessary to succeed. This documentation is designed to aid in building your understanding of Anaconda software and assist with any operations you may need to perform to manage your organization’s users and resources. The conda installation doc is here: https://www.anaconda.com/docs/getting-started/miniconda/install#windows-installation I am using below request to download conda windows installation package ``` curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe --output .\Downloads\Miniconda3-latest-Windows-x86_64.exe ``` After then I followed the steps to install F5-TTS ## Installation ### Create a separate environment if needed ```shell # Create a python 3.10 conda env (you could also use virtualenv) conda create -n f5-tts python=3.10 conda activate f5-tts ``` ### Install PyTorch with matched device NVIDIA GPU ```shell # Install pytorch with your CUDA version, e.g. pip install torch==2.4.0+cu124 torchaudio==2.4.0+cu124 --extra-index-url https://download.pytorch.org/whl/cu124 ``` AMD GPU ```shell # Install pytorch with your ROCm version (Linux only), e.g. pip install torch==2.5.1+rocm6.2 torchaudio==2.5.1+rocm6.2 --extra-index-url https://download.pytorch.org/whl/rocm6.2 ``` Intel GPU ```shell # Install pytorch with your XPU version, e.g. # Intel® Deep Learning Essentials or Intel® oneAPI Base Toolkit must be installed pip install torch torchaudio --index-url https://download.pytorch.org/whl/test/xpu # Intel GPU support is also available through IPEX (Intel® Extension for PyTorch) # IPEX does not require the Intel® Deep Learning Essentials or Intel® oneAPI Base Toolkit # See: https://pytorch-extension.intel.com/installation?request=platform ``` Apple Silicon ```shell # Install the stable pytorch, e.g. pip install torch torchaudio ``` ### Then you can choose one from below: ### 1. As a pip package (if just for inference) ```shell pip install f5-tts ``` ### 2. Local editable (if also do training, finetuning) ```shell git clone https://github.com/SWivid/F5-TTS.git cd F5-TTS # git submodule update --init --recursive # (optional, if need > bigvgan) pip install -e . ``` It ran One problem encountered during this process was that **ffmpeg** could not be found, and the error code was: ```shell ffmpeg was not found but is required to load audio files from filename ``` I later found some information on the Internet and successfully solved this problem. The main problem is that I need to download the FFMPG component package and then add it to the computer's environment variables. - download ffmpeg from official website: https://www.gyan.dev/ffmpeg/builds/ - Exact all files and move 3 exe file to c:\ffmpeg folder ![Image](http://zipline.ishenwei.online/u/qfxdBV.png) - Configure this patch in system ![Image](http://zipline.ishenwei.online/u/1GVc5t.png) ### Launch Web UI - Gradio App Currently supported features: - Basic TTS with Chunk Inference - Multi-Style / Multi-Speaker Generation - Voice Chat powered by Qwen2.5-3B-Instruct - [Custom inference with more language support](https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/infer/SHARED.md) ```shell # Launch a Gradio app (web interface) f5-tts_infer-gradio # Specify the port/host f5-tts_infer-gradio --port 7860 --host 0.0.0.0 # Launch a share link f5-tts_infer-gradio --share ``` Open browser: http://127.0.0.1:7860/ to launch web UI Gradio App I tried to run a voice conversion. You need to provide a reference voice first. Then it will generate the corresponding voice for you based on the reference voice and the text you input. I tried it and the effect was very good. ![Image](http://zipline.ishenwei.online/u/GD3HYa.png) But there is one thing. Because I haven't set up the GPU to accelerate the calculation, the whole conversion is completely operated by the CPU. Therefore, the CPU usage is very high during the conversion process, and the time is relatively slow. I haven't had time to use the GPU to do this conversion process yet. I haven't tried it yet. Maybe I will try it tomorrow.