October update

7 min readOct 25, 2019

Hi backers,

We delayed our monthly update by one week (Oct 1st-8th was a week off in China and we had less time) so that we could share more substantial progress with you.

We’re sorry but we won’t be able to start shipping this month. We’re having some issues with the quality of Lily’s voice AI that would result in an unacceptable user experience. We can’t ship a product that doesn’t work well and that you would only use as a Bluetooth speaker (although it has good sound).

Here are the problems we’re having and what we’re doing to fight them.

Problem 1: Lily’s Voice Synthesis

Having a good Chinese voice to teach Chinese is essential but our synthesized voice, developed and trained in-house, is not good enough yet. In the video that follows, you can see the state of Lily’s voice quality:

Lily’s Voice Synthesis quality

You can see that we made great progress and that we already have a clean Chinese voice. However it’s not the voice of our voice actress.

We’re also having troubles modulating the speed of Lily’s voice to make her speak slower when she repeats something or makes you practice your pronunciation (see our pronunciation demo later in this update). You can see in the video below how strange Lily’s voice becomes when she speaks slower:

Lily’s Voice Synthesis speed modulation

Lastly we are having difficulties to make Lily speak with the same timbre when she speaks English and Chinese in the same sentence. Here’s a video that will make you understand the problem immediately:

Lily’s Voice Synthesis multilingual timbre

Having a synthetic voice that switches naturally from one language to another while keeping the same timbre is one of the hardest voice AI task in the industry. Our team has a very talented Voice Synthesis engineer that comes from one of the top 2 Voice Synthesis lab in China (Tsinghua University) working on it.

Let’s get to the solutions now. The 1st step to improve Lily’s voice is to improve/increase the voice data we’re using to train our AI models. Here’s a short video about our Voice Synthesis recording, you will be able to hear Lily’s final voice for the 1st time:

Lily’s Voice Synthesis recording

We announced a few months ago that we found the voice actress that would be Lily’s voice and we’ve been working with her ever since. But collecting voice data takes much much longer than anyone would expect and you can find out the reason why in this video:

Lily’s Voice Synthesis annotation

Aside of our efforts on the data side, we’re also constantly improving our Voice Synthesis algorithms. We’re confident we’ll be able to solve most of the problems you saw on our previous videos within 2 months.

Problem 2: Lily’s Speech Recognition

Speech Recognition is the technique that transforms your voice, recorded by microphones, into text. Lily needs to have a good Speech Recognition to understand what you say, otherwise she won’t be able to teach you anything.

We have a 1st version of Lily’s Speech Recognition that we trained over a few thousand hours of voice data. The accuracy is good for native Chinese speakers but drops drastically for non-native Chinese speakers as you can see in the picture below:

On the same sentence, our Speech Recognition worked perfectly on the Chinese speaker but wrongly recognized 3 characters for the non-native speaker (circled in red).

The most effective way to improve accuracy for non-native Chinese speakers is to get more voice data from them. Alexis has been busy collecting data from foreign students enrolled in Chinese universities. The process is very long since we need to record data from hundreds of people and it’s expensive because we need to pay them. It rapidly becomes an operational nightmare because we don’t record them remotely through a phone app, instead we do most of the recordings on-site in real-usage conditions since this is better for Speech Recognition.

To optimize this process, we’ve developed a new speaker dedicated to voice collection. It looks very different than Lily as you can see in this video:

Lily’s brother which does the hard work of voice collection

This speaker is a stripped version of Lily, there is no loudspeaker inside and no fabric and it’s much simpler to assemble. This allows us to scale our voice collection operations and record data faster/cheaper.

We’ve also developed a data annotation platform to quickly annotate all the data we are collecting so that it can be used to train our Speech Recognition AI models. Here is a screenshot:

Internal Speech Recognition data annotation platform

Thanks to our voice collection speaker and our data annotation platform, we can now operate voice collection approximately 4 times faster for 3 times cheaper, and drive up the accuracy of our Speech Recognition.

New shipping timeline

Due to those problems, here is our new shipping timeline:

Again, we’re sorry for this delay. Our target is to start shipping before the Chinese New Year. Lily’s hardware is ready and we’ve already started sourcing for this 1st batch.

We will now share other progress we made during the past month.

Live pronunciation practice demo

We’ve been steadily implementing Lily’s different product features, here is one feature that we’ve never talked much about: it’s how Lily can help you practice your pronunciation, watch the video below.

Lily’s live pronunciation practice demo

This video is not a montage, it has been recorded in real-time in our office with 3 phones from 3 different angles.

The demo you see is really for early beginners that are just getting started with Chinese. This is why we mix English/Chinese and Lily goes very slowly. Pronunciation practice for more advanced learners will be different and Lily will speak Chinese only. The voice of Lily in the video is not final (see previous sections about our difficulties with Speech Synthesis) but you can see that Lily goes further than any other language learning tools in terms of interactivity.

Lily’s companion app: the Search

You’ll be able to search for words, translations, vocabulary, conversations, grammar exercises, etc… in Pinyin and English through Lily’s companion app. Here are some screenshots of a user searching the keyword “jiana” :

Searching “jiana” with Lily’s companion app

Better microphones for Lily

We’ve had to change the sourcing of our microphones because of a Signal-to-Noise ratio problem (SNR), we used to source from one of the suppliers of Xiaomi. We’re now sourcing microphones from KNOWLES who makes the best microphones in the industry, for example, it’s what Amazon is using. We’re using the digital MEMS microphones you see in the description below, taken from the datasheet:

Lily’s ears: Knowles digital MEMS microphones

Basically, those are the best microphones we could find, the Signal-to-Noise ratio (SNR) is higher than our previous microphones and improves our Speech Recognition for distant usage >= 2meters.

Fabric colors’ development and sourcing

We developed the product in white/red originally and the other colors were just prototypes or 3D renderings. We’re now tuning the final colors with our fabric factory for all the colors, here’s a video of our results:

Lily fabric color’s development/sourcing with the fabric factory

Lots of iterations are needed to get the right tone for each color. The Pantone numbers on the paper never come out of the factory the way the industrial designer expects it. For example, look at the white speaker in the video, it’s almost light grey because the white fabric from the factory was not white enough on the early iterations. But now, after 3 iterations, we have a white fabric that is whiter than the Homepod. The colors are beautiful.

We’re now already late 7 months on our original March schedule. I can understand your frustration and it’s not easy to announce a delay for the 3rd time to people who have been good to us. We’re very sorry for that. Sometimes we’re not as tall as we’d like to be, we’re just a small startup, less than 20 people, average age less than 30, with dark rings under our eyes.

There’s no shipping guarantee on Indiegogo, but the more we advance in the campaign, the more we control our tech, the more confident we are that we can deliver this product to you. What we can guarantee is to be as transparent as possible in our updates about our work so that you can keep track of our progress.

You’ve already supported us for 10 months, and despite everything, you’re still supporting us today. It’s something that amazes us every day and gives a sense to our work.

Thanks.

Jie and the Maybe team