{"id":58,"date":"2025-08-28T06:06:13","date_gmt":"2025-08-28T06:06:13","guid":{"rendered":"https:\/\/www.hifitoolkit.com\/tech-news\/?p=58"},"modified":"2025-08-28T06:07:52","modified_gmt":"2025-08-28T06:07:52","slug":"microsoft-releases-vibevoice","status":"publish","type":"post","link":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/","title":{"rendered":"Microsoft Releases VibeVoice-1.5B: A Frontier Open-Source TTS Built for Long, Multi-Speaker Conversation"},"content":{"rendered":"\n<p><strong>Microsoft Releases VibeVoice-1.5B<\/strong>, a text-to-speech (TTS) model designed to generate long-form, expressive, <em>multi-speaker<\/em> conversations\u2014think podcasts, panel shows, or narrative audio with several characters. Unlike typical TTS systems tuned for single-sentence lines, VibeVoice aims squarely at <em>dialogue<\/em> and <em>duration<\/em>: it can synthesize up to <strong>~90 minutes<\/strong> of speech in a single generation with <strong>up to four distinct speakers<\/strong> and keep the tone, pacing, and turn-taking coherent over time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">TL;DR \u2014 Why VibeVoice-1.5B is a big deal<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Long context, long audio:<\/strong> Trained to a <strong>65,536-token<\/strong> (\u224864K) context, enabling continuous speech up to ~90 minutes.<\/li>\n\n\n\n<li><strong>Multi-speaker out of the box:<\/strong> Supports <strong>four<\/strong> speakers with natural turn-taking\u2014unusual for open TTS today.<\/li>\n\n\n\n<li><strong>Next-token diffusion + LLM:<\/strong> Marries a lightweight diffusion head to a <strong>Qwen2.5-1.5B<\/strong> LLM backbone for semantic planning and acoustic detail, producing more expressive delivery.<\/li>\n\n\n\n<li><strong>Efficient continuous speech tokenizers:<\/strong> Novel acoustic &amp; semantic tokenizers run at <strong>7.5 Hz<\/strong>, preserving fidelity while making very long sequences tractable. The technical report claims <strong>~80\u00d7<\/strong> compression vs. Encodec-style approaches.<\/li>\n\n\n\n<li><strong>Open license + safety guardrails:<\/strong> Released under <strong>MIT<\/strong>, with <strong>audible disclaimers<\/strong> and <strong>imperceptible watermarks<\/strong> in outputs to help mitigate misuse.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">What exactly is VibeVoice?<\/h2>\n\n\n\n<p>VibeVoice is a framework and family of models for conversational TTS. The <strong>1.5B<\/strong> in the name refers to the LLM component (Qwen2.5-1.5B). The overall stack (LLM + tokenizers + diffusion head) is reported around <strong>~2.7B parameters<\/strong> on the Hugging Face card. Microsoft also provides a <strong>7B preview<\/strong> optimized for stability and quality (shorter max length), and a <strong>streaming<\/strong> variant is &#8220;on the way.&#8221;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How it works (at a high level)<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Continuous speech tokenization (7.5 Hz).<\/strong><br>VibeVoice introduces two tokenizers:\n<ul class=\"wp-block-list\">\n<li><strong>Acoustic tokenizer<\/strong> (\u03c3-VAE variant) that compresses waveforms ~3200\u00d7 from 24kHz input.<\/li>\n\n\n\n<li><strong>Semantic tokenizer<\/strong> (ASR-proxy trained) that captures higher-level content\/prosody.<br>This dual-view tokenization keeps rich detail while making long contexts computationally feasible.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>LLM for dialogue &amp; semantics.<\/strong><br>A <strong>Qwen2.5-1.5B<\/strong> backbone plans content, timing, and speaker turns over very long contexts.<\/li>\n\n\n\n<li><strong>Next-token diffusion for acoustics.<\/strong><br>A small <strong>diffusion head<\/strong> (4 layers, ~123M params) predicts the acoustic VAE features step-by-step, guided by the LLM\u2019s hidden states via <strong>classifier-free guidance<\/strong> and <strong>DPM-Solver<\/strong> at inference. This \u201cnext-token diffusion\u201d unifies continuous generation with autoregressive pacing.<\/li>\n\n\n\n<li><strong>Curriculum on length.<\/strong><br>Training ramps from 4k\u219216k\u219232k\u219264k tokens, which helps stabilize long-context learning\u2014crucial for 45\u201390 minutes of coherent audio.<\/li>\n<\/ol>\n\n\n\n<p>For a formal treatment (and comparisons to Encodec and other baselines), see the <strong>VibeVoice technical report (Aug 26, 2025)<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Capabilities<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Long-form generation:<\/strong> Up to <strong>~90 minutes<\/strong> per run on the 1.5B model; the 7B preview targets ~45 minutes with improved stability\/quality.<\/li>\n\n\n\n<li><strong>Multi-speaker dialogue:<\/strong> Natively supports <strong>4 speakers<\/strong>, preserving consistency in timbre and pacing across turns. <a href=\"https:\/\/microsoft.github.io\/VibeVoice\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft GitHub<\/a><\/li>\n\n\n\n<li><strong>Expressiveness:<\/strong> Handles subtle emotion shifts and \u201cconversational vibe.\u201d Demos showcase context-aware emphasis, and even <strong>spontaneous singing<\/strong> as an emergent ability.<\/li>\n\n\n\n<li><strong>Cross-lingual hints:<\/strong> Trained primarily on <strong>English and Chinese<\/strong>. It can display cross-lingual synthesis, though Chinese stability is acknowledged as weaker vs. English. <a href=\"https:\/\/github.com\/microsoft\/VibeVoice\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Known limitations (read before deploying)<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Language scope:<\/strong> Officially <strong>English + Chinese<\/strong>; other languages are unsupported and may sound garbled.<\/li>\n\n\n\n<li><strong>Spontaneous background sounds:<\/strong> The team notes <strong>occasional, spontaneous BGM\/sound effects<\/strong>\u2014not controllable and more stable on the 7B variant. This is treated partly as a \u201cfun\u201d emergent behavior, not a guaranteed feature.<\/li>\n\n\n\n<li><strong>No overlapping speech modeling:<\/strong> It doesn\u2019t explicitly generate overlapping talk segments.<\/li>\n\n\n\n<li><strong>Research-grade release:<\/strong> Microsoft advises <strong>against production<\/strong> use without more testing; treat as R&amp;D.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Responsible release &amp; licensing<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>License:<\/strong> <strong>MIT<\/strong> (open, permissive).<\/li>\n\n\n\n<li><strong>Built-in guardrails:<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Audible disclaimer<\/strong> inserted in every audio segment (e.g., \u201cThis segment was generated by AI\u201d).<\/li>\n\n\n\n<li><strong>Imperceptible watermark<\/strong> for provenance checks.<\/li>\n\n\n\n<li>Strict guidance against impersonation and deceptive use.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Quickstart: Run VibeVoice-1.5B locally<\/h2>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Prereqs:<\/strong> Recent NVIDIA GPU drivers; a PyTorch NGC container is recommended. Flash-Attention may improve speed\/memory.<\/p>\n<\/blockquote>\n\n\n\n<p><strong>1) Launch an NVIDIA PyTorch container (example):<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo docker run --privileged --net=host --ipc=host --ulimit memlock=-1:-1 --ulimit stack=-1:-1 --gpus all -it nvcr.io\/nvidia\/pytorch:24.07-py3\n<\/code><\/pre>\n\n\n\n<p><strong>2) Install VibeVoice:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>git clone https:\/\/github.com\/microsoft\/VibeVoice.git\ncd VibeVoice\npip install -e .\n<\/code><\/pre>\n\n\n\n<p><strong>3) (Optional) Install Flash-Attention if your image lacks it:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pip install flash-attn --no-build-isolation\n<\/code><\/pre>\n\n\n\n<p><strong>4) Try the Gradio demo (1.5B model):<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apt update &amp;&amp; apt install -y ffmpeg\npython demo\/gradio_demo.py --model_path microsoft\/VibeVoice-1.5B --share\n<\/code><\/pre>\n\n\n\n<p><strong>5) File-driven inference with named speakers:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python demo\/inference_from_file.py \\\n  --model_path microsoft\/VibeVoice-1.5B \\\n  --txt_path demo\/text_examples\/2p_music.txt \\\n  --speaker_names Alice Frank\n<\/code><\/pre>\n\n\n\n<p>(For higher stability\u2014especially Chinese\u2014try the <strong>7B preview<\/strong>.)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Practical use cases<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AI podcast prototyping:<\/strong> Generate host+guest dialogues with scene changes, ad reads, and long-form story arcs.<\/li>\n\n\n\n<li><strong>Audiobook character voices:<\/strong> Multi-character narration with consistent timbres across chapters.<\/li>\n\n\n\n<li><strong>Conversational agents &amp; IVRs:<\/strong> Script multi-party role-plays (support lines, sales scenarios) for training and QA.<\/li>\n\n\n\n<li><strong>Education &amp; language labs:<\/strong> Create lengthy, contextual dialogues for listening comprehension, especially in English.<\/li>\n<\/ul>\n\n\n\n<p><em>(Note: Respect licensing and content policy; avoid impersonation and misleading use.)<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Tips for best results<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Plan the script like a screenplay.<\/strong> Add explicit speaker tags and stage directions (e.g., \u201c[pause]\u201d, \u201c[softly]\u201d), which LLM-driven TTS often uses as cues. (General practice; see demos for style.)<\/li>\n\n\n\n<li><strong>Keep punctuation simple.<\/strong> The team advises <strong>English-style punctuation<\/strong> even for Chinese text to avoid oddities.<\/li>\n\n\n\n<li><strong>Choose clean speaker prompts.<\/strong> Voices with background music in the prompt are more likely to trigger spontaneous BGM\u2014pick clean ones if you want strictly speech.<\/li>\n\n\n\n<li><strong>Use the 7B preview for stability<\/strong> if you can afford the compute; the 1.5B is great for experimentation and length, but 7B tends to be steadier.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">How does it compare to other open TTS systems?<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>vs. Coqui XTTS \/ Piper \/ Bark:<\/strong> These are strong single-voice TTS systems but are not primarily designed for <strong>long, multi-speaker conversational<\/strong> coherence over tens of minutes. VibeVoice\u2019s 64K context and dedicated multi-speaker handling target that niche directly. (Inference based on VibeVoice docs and public positioning.)<\/li>\n\n\n\n<li><strong>vs. research models like CosyVoice 2 (Tencent), etc.:<\/strong> Some recent systems demo long-form or expressive abilities, but Microsoft\u2019s release stands out for the <strong>combination<\/strong> of <strong>open weights + multi-speaker + long duration<\/strong> under a permissive license. Check VibeVoice report\/project page for detailed methodology.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Benchmarks &amp; quality signals<\/h2>\n\n\n\n<p>The project page and paper highlight preference tests (MOS-style) and qualitative demos spanning <strong>spontaneous emotion<\/strong>, <strong>singing<\/strong>, <strong>cross-lingual snippets<\/strong>, and <strong>long four-speaker conversations<\/strong>. While numbers vary by setup, the qualitative takeaway is that VibeVoice narrows the \u201crobotic\u201d gap in extended dialogue. Review the official demos and technical report for specifics.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Risks and responsible use<\/h2>\n\n\n\n<p>High-fidelity synthetic speech can be misused for <strong>impersonation, fraud, or disinformation<\/strong>. Microsoft explicitly forbids such use, embeds disclaimers\/watermarks, and recommends <strong>R&amp;D-only<\/strong> deployment for now. If you publish generated content, disclose AI usage and avoid deceptive contexts.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>FAQ<\/strong><\/h3>\n\n\n\n<div class=\"schema-faq wp-block-yoast-faq-block\"><div class=\"schema-faq-section\" id=\"faq-question-1756360513645\"><strong class=\"schema-faq-question\">Is VibeVoice-1.5B really \u201c1.5B\u201d parameters?<\/strong> <p class=\"schema-faq-answer\">The <strong>LLM component<\/strong> is ~1.5B (Qwen2.5-1.5B). The <strong>overall system<\/strong> shown on the HF card reports <strong>~2.7B<\/strong> when you account for tokenizers\/diffusion head.<\/p> <\/div> <div class=\"schema-faq-section\" id=\"faq-question-1756360526353\"><strong class=\"schema-faq-question\">Can it handle background music?<\/strong> <p class=\"schema-faq-answer\">Not as a controllable feature; <strong>some generations may include spontaneous BGM\/sounds<\/strong>. Treat it as unpredictable and use clean prompts if you want speech-only.<\/p> <\/div> <div class=\"schema-faq-section\" id=\"faq-question-1756360543276\"><strong class=\"schema-faq-question\">What\u2019s the license?<\/strong> <p class=\"schema-faq-answer\"><strong>MIT<\/strong>. Still, respect usage restrictions and local laws, and don\u2019t impersonate real people.<\/p> <\/div> <div class=\"schema-faq-section\" id=\"faq-question-1756360644986\"><strong class=\"schema-faq-question\">Where do I start?<\/strong> <p class=\"schema-faq-answer\">Check the <strong>Hugging Face model card<\/strong>, <strong>project page<\/strong>, and <strong>GitHub repo<\/strong>. There are also community Spaces and demos you can try instantly. <a href=\"https:\/\/huggingface.co\/microsoft\/VibeVoice-1.5B\" target=\"_blank\" rel=\"noreferrer noopener\">Hugging Face<\/a><a href=\"https:\/\/microsoft.github.io\/VibeVoice\" target=\"_blank\" rel=\"noreferrer noopener\"><span style=\"text-decoration: underline;\">  <\/span>Microsoft GitHub<\/a><a href=\"https:\/\/github.com\/microsoft\/VibeVoice\" target=\"_blank\" rel=\"noreferrer noopener\">    GitHub<\/a><\/p> <\/div> <\/div>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Final thoughts<\/strong><\/h4>\n\n\n\n<p>Microsoft Releases VibeVoice-1.5B isn\u2019t just \u201canother TTS checkpoint.\u201d It\u2019s a <strong>framework for long, multi-speaker conversation<\/strong> that blends LLM planning with diffusion acoustics and highly efficient tokenization. If you\u2019re exploring <strong>podcast automation, multi-character storytelling, or synthetic panel discussions<\/strong>, this is one of the most compelling open baselines to experiment with in 2025\u2014open weights, permissive license, and a research team leaning into responsible release.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>References &amp; official resources<\/strong><\/h4>\n\n\n\n<p><strong>Hugging Face model card (license, training details, limits):<\/strong> VibeVoice-1.5B. <a href=\"https:\/\/huggingface.co\/microsoft\/VibeVoice-1.5B\" target=\"_blank\" rel=\"noreferrer noopener\">Hugging Face<\/a><\/p>\n\n\n\n<p><strong>Project page with demos &amp; feature highlights:<\/strong> Microsoft VibeVoice. <a href=\"https:\/\/microsoft.github.io\/VibeVoice\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft GitHub<\/a><\/p>\n\n\n\n<p><strong>GitHub repo (setup, commands, FAQs):<\/strong> microsoft\/VibeVoice. <a href=\"https:\/\/github.com\/microsoft\/VibeVoice\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub<\/a><\/p>\n\n\n\n<p><strong>Technical report (Aug 26, 2025):<\/strong> <em>VibeVoice Technical Report<\/em> (tokenizers, next-token diffusion, 64K context). <a href=\"https:\/\/arxiv.org\/abs\/2508.19205?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noreferrer noopener\">arXiv<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft Releases VibeVoice-1.5B, a text-to-speech (TTS) model designed to generate long-form, expressive, multi-speaker conversations\u2014think podcasts, panel shows, or narrative audio<a class=\"read-more ml-1 main-read-more\" href=\"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/\">Read More<\/a><\/p>\n","protected":false},"author":1,"featured_media":59,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[44],"tags":[46,45,47],"class_list":["post-58","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-microsoft","tag-microsoft","tag-microsoft-releases-vibevoice-1-5b","tag-vibevoice"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Microsoft Releases VibeVoice-1.5B: Open-Source Conversational TTS - HiFi Toolkit<\/title>\n<meta name=\"description\" content=\"Microsoft Releases VibeVoice-1.5B, an open-source text-to-speech model designed for long-form, multi-speaker conversations. With up to 90 minutes of continuous audio, four distinct voices, and next-token diffusion, VibeVoice sets a new benchmark for AI-generated podcasts, audiobooks, and dialogue.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Microsoft Releases VibeVoice-1.5B: Open-Source Conversational TTS - HiFi Toolkit\" \/>\n<meta property=\"og:description\" content=\"Microsoft Releases VibeVoice-1.5B, an open-source text-to-speech model designed for long-form, multi-speaker conversations. With up to 90 minutes of continuous audio, four distinct voices, and next-token diffusion, VibeVoice sets a new benchmark for AI-generated podcasts, audiobooks, and dialogue.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/\" \/>\n<meta property=\"og:site_name\" content=\"HiFi Toolkit\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/hifitoolkit\" \/>\n<meta property=\"article:published_time\" content=\"2025-08-28T06:06:13+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-08-28T06:07:52+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.hifitoolkit.com\/tech-news\/wp-content\/uploads\/2025\/08\/ChatGPT-Image-Aug-28-2025-11_30_40-AM_11zon.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1536\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Pradeep Kumar\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Pradeep Kumar\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/\"},\"author\":{\"name\":\"Pradeep Kumar\",\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/#\\\/schema\\\/person\\\/efe865292c1ec682af776b63498dc77c\"},\"headline\":\"Microsoft Releases VibeVoice-1.5B: A Frontier Open-Source TTS Built for Long, Multi-Speaker Conversation\",\"datePublished\":\"2025-08-28T06:06:13+00:00\",\"dateModified\":\"2025-08-28T06:07:52+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/\"},\"wordCount\":1244,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/ChatGPT-Image-Aug-28-2025-11_30_40-AM_11zon.png\",\"keywords\":[\"Microsoft\",\"Microsoft Releases VibeVoice-1.5B\",\"VibeVoice\"],\"articleSection\":[\"Microsoft\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#respond\"]}]},{\"@type\":[\"WebPage\",\"FAQPage\"],\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/\",\"url\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/\",\"name\":\"Microsoft Releases VibeVoice-1.5B: Open-Source Conversational TTS - HiFi Toolkit\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/ChatGPT-Image-Aug-28-2025-11_30_40-AM_11zon.png\",\"datePublished\":\"2025-08-28T06:06:13+00:00\",\"dateModified\":\"2025-08-28T06:07:52+00:00\",\"description\":\"Microsoft Releases VibeVoice-1.5B, an open-source text-to-speech model designed for long-form, multi-speaker conversations. With up to 90 minutes of continuous audio, four distinct voices, and next-token diffusion, VibeVoice sets a new benchmark for AI-generated podcasts, audiobooks, and dialogue.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#breadcrumb\"},\"mainEntity\":[{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#faq-question-1756360513645\"},{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#faq-question-1756360526353\"},{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#faq-question-1756360543276\"},{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#faq-question-1756360644986\"}],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/ChatGPT-Image-Aug-28-2025-11_30_40-AM_11zon.png\",\"contentUrl\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/ChatGPT-Image-Aug-28-2025-11_30_40-AM_11zon.png\",\"width\":1536,\"height\":1024},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Microsoft Releases VibeVoice-1.5B: A Frontier Open-Source TTS Built for Long, Multi-Speaker Conversation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/#website\",\"url\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/\",\"name\":\"HiFi Toolkit\",\"description\":\"Free Online Tools &amp; Converters for Developers, Designers &amp; Productivity\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/#organization\",\"name\":\"HiFi Toolkit\",\"url\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/cropped-higilogo.png\",\"contentUrl\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/cropped-higilogo.png\",\"width\":865,\"height\":230,\"caption\":\"HiFi Toolkit\"},\"image\":{\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/hifitoolkit\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/#\\\/schema\\\/person\\\/efe865292c1ec682af776b63498dc77c\",\"name\":\"Pradeep Kumar\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/56f307c4c166ea13e81e3fa35c21fccdc554249f4e3fd31b6d47dfc755670dcc?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/56f307c4c166ea13e81e3fa35c21fccdc554249f4e3fd31b6d47dfc755670dcc?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/56f307c4c166ea13e81e3fa35c21fccdc554249f4e3fd31b6d47dfc755670dcc?s=96&d=mm&r=g\",\"caption\":\"Pradeep Kumar\"},\"sameAs\":[\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\"],\"url\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/author\\\/admin\\\/\"},{\"@type\":\"Question\",\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#faq-question-1756360513645\",\"position\":1,\"url\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#faq-question-1756360513645\",\"name\":\"Is VibeVoice-1.5B really \u201c1.5B\u201d parameters?\",\"answerCount\":1,\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"The <strong>LLM component<\\\/strong> is ~1.5B (Qwen2.5-1.5B). The <strong>overall system<\\\/strong> shown on the HF card reports <strong>~2.7B<\\\/strong> when you account for tokenizers\\\/diffusion head.\",\"inLanguage\":\"en-US\"},\"inLanguage\":\"en-US\"},{\"@type\":\"Question\",\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#faq-question-1756360526353\",\"position\":2,\"url\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#faq-question-1756360526353\",\"name\":\"Can it handle background music?\",\"answerCount\":1,\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"Not as a controllable feature; <strong>some generations may include spontaneous BGM\\\/sounds<\\\/strong>. Treat it as unpredictable and use clean prompts if you want speech-only.\",\"inLanguage\":\"en-US\"},\"inLanguage\":\"en-US\"},{\"@type\":\"Question\",\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#faq-question-1756360543276\",\"position\":3,\"url\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#faq-question-1756360543276\",\"name\":\"What\u2019s the license?\",\"answerCount\":1,\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"<strong>MIT<\\\/strong>. Still, respect usage restrictions and local laws, and don\u2019t impersonate real people.\",\"inLanguage\":\"en-US\"},\"inLanguage\":\"en-US\"},{\"@type\":\"Question\",\"@id\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#faq-question-1756360644986\",\"position\":4,\"url\":\"https:\\\/\\\/www.hifitoolkit.com\\\/tech-news\\\/microsoft-releases-vibevoice\\\/#faq-question-1756360644986\",\"name\":\"Where do I start?\",\"answerCount\":1,\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"Check the <strong>Hugging Face model card<\\\/strong>, <strong>project page<\\\/strong>, and <strong>GitHub repo<\\\/strong>. There are also community Spaces and demos you can try instantly. <a href=\\\"https:\\\/\\\/huggingface.co\\\/microsoft\\\/VibeVoice-1.5B\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">Hugging Face<\\\/a><a href=\\\"https:\\\/\\\/microsoft.github.io\\\/VibeVoice\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">  Microsoft GitHub<\\\/a><a href=\\\"https:\\\/\\\/github.com\\\/microsoft\\\/VibeVoice\\\" target=\\\"_blank\\\" rel=\\\"noreferrer noopener\\\">    GitHub<\\\/a>\",\"inLanguage\":\"en-US\"},\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Microsoft Releases VibeVoice-1.5B: Open-Source Conversational TTS - HiFi Toolkit","description":"Microsoft Releases VibeVoice-1.5B, an open-source text-to-speech model designed for long-form, multi-speaker conversations. With up to 90 minutes of continuous audio, four distinct voices, and next-token diffusion, VibeVoice sets a new benchmark for AI-generated podcasts, audiobooks, and dialogue.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/","og_locale":"en_US","og_type":"article","og_title":"Microsoft Releases VibeVoice-1.5B: Open-Source Conversational TTS - HiFi Toolkit","og_description":"Microsoft Releases VibeVoice-1.5B, an open-source text-to-speech model designed for long-form, multi-speaker conversations. With up to 90 minutes of continuous audio, four distinct voices, and next-token diffusion, VibeVoice sets a new benchmark for AI-generated podcasts, audiobooks, and dialogue.","og_url":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/","og_site_name":"HiFi Toolkit","article_publisher":"https:\/\/www.facebook.com\/hifitoolkit","article_published_time":"2025-08-28T06:06:13+00:00","article_modified_time":"2025-08-28T06:07:52+00:00","og_image":[{"width":1536,"height":1024,"url":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-content\/uploads\/2025\/08\/ChatGPT-Image-Aug-28-2025-11_30_40-AM_11zon.png","type":"image\/png"}],"author":"Pradeep Kumar","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Pradeep Kumar","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#article","isPartOf":{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/"},"author":{"name":"Pradeep Kumar","@id":"https:\/\/www.hifitoolkit.com\/tech-news\/#\/schema\/person\/efe865292c1ec682af776b63498dc77c"},"headline":"Microsoft Releases VibeVoice-1.5B: A Frontier Open-Source TTS Built for Long, Multi-Speaker Conversation","datePublished":"2025-08-28T06:06:13+00:00","dateModified":"2025-08-28T06:07:52+00:00","mainEntityOfPage":{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/"},"wordCount":1244,"commentCount":0,"publisher":{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/#organization"},"image":{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#primaryimage"},"thumbnailUrl":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-content\/uploads\/2025\/08\/ChatGPT-Image-Aug-28-2025-11_30_40-AM_11zon.png","keywords":["Microsoft","Microsoft Releases VibeVoice-1.5B","VibeVoice"],"articleSection":["Microsoft"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#respond"]}]},{"@type":["WebPage","FAQPage"],"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/","url":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/","name":"Microsoft Releases VibeVoice-1.5B: Open-Source Conversational TTS - HiFi Toolkit","isPartOf":{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#primaryimage"},"image":{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#primaryimage"},"thumbnailUrl":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-content\/uploads\/2025\/08\/ChatGPT-Image-Aug-28-2025-11_30_40-AM_11zon.png","datePublished":"2025-08-28T06:06:13+00:00","dateModified":"2025-08-28T06:07:52+00:00","description":"Microsoft Releases VibeVoice-1.5B, an open-source text-to-speech model designed for long-form, multi-speaker conversations. With up to 90 minutes of continuous audio, four distinct voices, and next-token diffusion, VibeVoice sets a new benchmark for AI-generated podcasts, audiobooks, and dialogue.","breadcrumb":{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#breadcrumb"},"mainEntity":[{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#faq-question-1756360513645"},{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#faq-question-1756360526353"},{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#faq-question-1756360543276"},{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#faq-question-1756360644986"}],"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#primaryimage","url":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-content\/uploads\/2025\/08\/ChatGPT-Image-Aug-28-2025-11_30_40-AM_11zon.png","contentUrl":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-content\/uploads\/2025\/08\/ChatGPT-Image-Aug-28-2025-11_30_40-AM_11zon.png","width":1536,"height":1024},{"@type":"BreadcrumbList","@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.hifitoolkit.com\/tech-news\/"},{"@type":"ListItem","position":2,"name":"Microsoft Releases VibeVoice-1.5B: A Frontier Open-Source TTS Built for Long, Multi-Speaker Conversation"}]},{"@type":"WebSite","@id":"https:\/\/www.hifitoolkit.com\/tech-news\/#website","url":"https:\/\/www.hifitoolkit.com\/tech-news\/","name":"HiFi Toolkit","description":"Free Online Tools &amp; Converters for Developers, Designers &amp; Productivity","publisher":{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.hifitoolkit.com\/tech-news\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.hifitoolkit.com\/tech-news\/#organization","name":"HiFi Toolkit","url":"https:\/\/www.hifitoolkit.com\/tech-news\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.hifitoolkit.com\/tech-news\/#\/schema\/logo\/image\/","url":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-content\/uploads\/2025\/08\/cropped-higilogo.png","contentUrl":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-content\/uploads\/2025\/08\/cropped-higilogo.png","width":865,"height":230,"caption":"HiFi Toolkit"},"image":{"@id":"https:\/\/www.hifitoolkit.com\/tech-news\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/hifitoolkit"]},{"@type":"Person","@id":"https:\/\/www.hifitoolkit.com\/tech-news\/#\/schema\/person\/efe865292c1ec682af776b63498dc77c","name":"Pradeep Kumar","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/56f307c4c166ea13e81e3fa35c21fccdc554249f4e3fd31b6d47dfc755670dcc?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/56f307c4c166ea13e81e3fa35c21fccdc554249f4e3fd31b6d47dfc755670dcc?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/56f307c4c166ea13e81e3fa35c21fccdc554249f4e3fd31b6d47dfc755670dcc?s=96&d=mm&r=g","caption":"Pradeep Kumar"},"sameAs":["https:\/\/www.hifitoolkit.com\/tech-news"],"url":"https:\/\/www.hifitoolkit.com\/tech-news\/author\/admin\/"},{"@type":"Question","@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#faq-question-1756360513645","position":1,"url":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#faq-question-1756360513645","name":"Is VibeVoice-1.5B really \u201c1.5B\u201d parameters?","answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"The <strong>LLM component<\/strong> is ~1.5B (Qwen2.5-1.5B). The <strong>overall system<\/strong> shown on the HF card reports <strong>~2.7B<\/strong> when you account for tokenizers\/diffusion head.","inLanguage":"en-US"},"inLanguage":"en-US"},{"@type":"Question","@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#faq-question-1756360526353","position":2,"url":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#faq-question-1756360526353","name":"Can it handle background music?","answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"Not as a controllable feature; <strong>some generations may include spontaneous BGM\/sounds<\/strong>. Treat it as unpredictable and use clean prompts if you want speech-only.","inLanguage":"en-US"},"inLanguage":"en-US"},{"@type":"Question","@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#faq-question-1756360543276","position":3,"url":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#faq-question-1756360543276","name":"What\u2019s the license?","answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"<strong>MIT<\/strong>. Still, respect usage restrictions and local laws, and don\u2019t impersonate real people.","inLanguage":"en-US"},"inLanguage":"en-US"},{"@type":"Question","@id":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#faq-question-1756360644986","position":4,"url":"https:\/\/www.hifitoolkit.com\/tech-news\/microsoft-releases-vibevoice\/#faq-question-1756360644986","name":"Where do I start?","answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"Check the <strong>Hugging Face model card<\/strong>, <strong>project page<\/strong>, and <strong>GitHub repo<\/strong>. There are also community Spaces and demos you can try instantly. <a href=\"https:\/\/huggingface.co\/microsoft\/VibeVoice-1.5B\" target=\"_blank\" rel=\"noreferrer noopener\">Hugging Face<\/a><a href=\"https:\/\/microsoft.github.io\/VibeVoice\" target=\"_blank\" rel=\"noreferrer noopener\">  Microsoft GitHub<\/a><a href=\"https:\/\/github.com\/microsoft\/VibeVoice\" target=\"_blank\" rel=\"noreferrer noopener\">    GitHub<\/a>","inLanguage":"en-US"},"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-json\/wp\/v2\/posts\/58","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-json\/wp\/v2\/comments?post=58"}],"version-history":[{"count":1,"href":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-json\/wp\/v2\/posts\/58\/revisions"}],"predecessor-version":[{"id":60,"href":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-json\/wp\/v2\/posts\/58\/revisions\/60"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-json\/wp\/v2\/media\/59"}],"wp:attachment":[{"href":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-json\/wp\/v2\/media?parent=58"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-json\/wp\/v2\/categories?post=58"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hifitoolkit.com\/tech-news\/wp-json\/wp\/v2\/tags?post=58"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}