Skip to content(if available)orjump to list(if available)

Show HN: Personalized Duolingo (kind of) for vocabulary building

Show HN: Personalized Duolingo (kind of) for vocabulary building

70 comments

·January 20, 2025

Hi! Wanted to share the project I really wanted to have. TLDR; this app lets you create your own list of words and you get a Duolingo-like experience (kind of, still needs a lot of features) practicing those words in their context.

My English is not the best but not the worst either. But I realized I can't boost it up after a certain level! In my belief, in order to truly learn a language, you need to be exposed to that language often. Vocabulary is the key factor here if you really want to improve in any language.

My experience is that when I read a book to improve my English vocabulary, I encounter words that I don't know so often and my reading gets disturbed. I go look for the meaning, come back, put it in context, re-read it, etc. It didn't work for me. So I tried listening to audiobooks - I listen to the book and read along, and whenever I encounter a word, I write it down. I get these 50 words in 2-3 pages and I ask ChatGPT to give me their meanings. I read them, take the book, and now read it myself. That helps for sure, but still after a while I lose those words because I never encounter them again. Well then, in order to not forget those words, I need some kind of exercise, right? A flashcard app maybe? Well, I still need to go out there, ask ChatGPT to create questions, put them in a flashcard app, etc. It's still time-consuming and this is supposed to be fun!

I need to be exposed to English in my daily life. I just need to save the words somewhere and whenever I want, I need to be able to practice them in a fun way, in Duolingo style maybe? So then I realized would it be better to store words in their own context? I mean, say I read Harry Potter and have a list of words I encountered in it, say I watch Breaking Bad and have a list of words I encountered watching it. I believe seeing those words together and practicing together makes it easier to remember them.

But I shouldn't be the one adding the meaning of the word and the one to generate exercises, right? It all should be automated. The exercise part will be handled by LLM for sure, but for the meaning of the word, I can fetch from a dictionary? But I really don't like the dictionary definitions and one word can have multiple meanings in their own context. So then I need to use LLM for this task too and have the word's meaning in its own context.

You create a list for your context, you add words, meanings get added automatically, and I see the word added in a different color (coloring is also a method used to remember words). It all takes seconds. And whenever I want to practice these lists, I can use learn mode to learn and test my knowledge in quiz mode. So I basically built this app ((thanks to Claude 3.5 Sonnet)). I want it to be like Duolingo, but of course I still have a way ahead to go, but wanted to share it in hopes of getting contributors.

You can read more in the repository. I would love to get your thoughts on this.

cat_multiverse

Hey there, quick suggestion as a PhD Linguistics candidate and avid language learner!

The best way I've found to identify vocabulary most important to my life is through journaling in the language I'm trying to learn. Describing exactly what I did that day, my thoughts, etc, as best I can.

I had thought of doing the journal entries digitally and gathering dictionary headwords from such journal entries, whether they're written in my mother tongue (English) or not, and use the built dictionary lists to drill vocab.

Traditionally you'd use a lemmatizer with a morphosyntactic tagger for the language to identify the dictionary words, but AI is serviceable these days to easily identify dictionary words from long-form text in many languages, though honestly would be surprised if AI outperforms the traditional methods already.

Good luck and have fun :)

learning-tr

Thoughts on FSI methodology? That's what I used for mine (my app).

cat_multiverse

Honestly had never even heard of it! But adult language acquisition isn't really a domain of study I've ever been interested in. I can only speak to what I have found most helpful in my own adult language acquisition journeys. The journaling method was taught to me by a polyglot friend of mine and it sort of solved the "what actually is my everyday vocabulary anyway" side of language learning for me.

learning-tr

tl;dr "The Foreign Service Institute (FSI) is the primary training institution to prepare American diplomats to advance U.S. foreign affairs interests, teaching, among other things, the languages of the countries where Foreign Service Officers will serve. "

Apologies, I should have linked beforehand.

tkgally

This looks really good. I wish I had had something like this many years ago when I was studying languages.

Somebody has already suggested adding spaced repetition and audio, which I agree with completely.

One more suggestion: In addition to having the LLM give you the meaning and example for the context in which you originally saw the word, also ask it to provide the word’s other main meanings and examples of it being used in those senses. You might encounter a word first in a slang or technical sense; while it’s useful to learn that meaning, it’s also important to learn other, more common meanings.

Below are some examples of words you might encounter first in technical contexts but would also be worth knowing in their more general meanings. (Examples suggested and defined by ChatGPT o1.)

canonical

Religious/General: Relating to a canon (e.g., church law) or a recognized body of works.

Math/Computing: Conforming to a standard or simplest form (e.g., “canonical form” of an equation).

resolution

General: A firm decision or determination (often heard in “New Year’s resolution”).

Tech/Imaging: The detail an image holds, typically measured in pixels, dots per inch (DPI), etc.

protocol

Diplomatic/General: The official procedure or set of rules governing state or ceremonial events.

Computing: A set of conventions and rules for transmitting data between electronic devices.

flux

General: Continuous movement or change, often implying instability.

Physics/Engineering: The amount of some quantity (e.g., heat, magnetism) passing through a given area over time.

arbayi

Thank you! I think that would be a good feature. I use the app, and it works fine when I add words as I encounter them, but when I later revisit the list, I sometimes wish I could see the sentences where the word were used (originally) to better understand it. Based on your suggestion, having a 'show examples' button/link below the meaning would be a cool feature to add.

5eyyah

`show examples` is good idea

PeterSmit

I’ve had this same idea, and it doesn’t work. Or at least: it works quote well, but the problem is that you get hallucinations. And it can be incredibly discouraging to find out the flashcards you’ve been cramming are completlh wrong.

3D30497420

I've had this same problem using ChatGPT and German. Even for basic German hallucinations can be unexpected and problematic. (I don't recall the model, but it was a recent one.)

In one instance, I was having it correct akkusativ/dativ/nominativ sentences and it would say the sentence is in one case when I knew it was in another case. I'd ask ChatGPT if it was sure, and then it would change its answer. If pressed further, it would again change its answer.

I was originally quite excited about using an LLM for my language practice, but now I'm pretty cautious with it.

It is also why I'm very skeptical of AI-based language learning apps, especially if the creator is not a native speaker.

arbayi

Would agentic workflows come in handy in these cases? I mean having a controller agent after the sentence is created, where this agent would be able to search the web or have access to a database? or personal notes and ensure everything is correct.

tkgally

What models have you been using for that? While I haven’t tried automating the production of vocabulary lists through an API, within the last few weeks I have had the chat versions of ChatGPT 4o, Claude Sonnet 3.5, and one of the latest Gemini models produce annotated vocabulary lists based on literary texts in English, Russian, and Latin. I didn’t spot any hallucinations.

I was asking only for the meanings of the words and phrases, though. I didn’t ask for things like pronunciations, grammatical categories, etc. In the past, when I’ve tried to get that kind of granular information from LLMs, there were indeed errors, presumably because of tokenization issues.

A few days ago, I ran some similar tests with Japanese, asking for readings of kanji and jukugo in an extended text. All of the models I had tried before for such tasks had screwed up. This time, however, ChatGPT o1 scored 100%. It also was able to analyze sentence grammar accurately, unlike the other models I tried. I was impressed.

At current API prices, though, o1 might be a bit too expensive for such a task.

arbayi

I wonder if there are any benchmarks specifically designed to evaluate LLMs' performance in language learning tasks

learning-tr

I had this problem initially but found that if you use these then hallucinations mostly go away.

1. Role based "agents" with a router and logs (for auditing reasoning and decision making).

2. Cross validation and redundancy with the translation "agent" using a 2nd language (that is not English) that you are also native in to check if the translation carries the same "meaning" (sentiment) and cultural significance (Turkish is especially rich in symbolism and cultural memes).

YMMV: I am a car salesman irl and have no formal training.

sebnun

Great work, I had a similar need, and built a similar app (using podcasts) [1]

I originally planned to add some kind of SRS to it, but I found that I learned much better just reading things in context instead of explicitly using SRS to memorize them. Steve Kaufmann (creator of LingQ) explains this better here [2]

[1] https://www.langturbo.com

[2] https://www.youtube.com/watch?v=t26IPxExmzs

arbayi

Thank you so much both for your comment and for sharing your app! (there are definitely great tools out there that we're not aware of) I am very happy to find your app because I actually needed something like this! I enjoy listening while working and being able to see the transcription alongside it, with word definitions in context - this kind of learning really works for me! It's fantastic how it supports all those languages - you can listen, read, and look up definitions all in one place. Looking at this, the one I shared above looked very basic. You handle transcription, media playing, testing pronouncation, LLM interaction I guess for contextual meaning and examples... ! The only question I have (sorry if this already exists - but i couln't find it) but is there a chance I can see a list of words I've encountered and marked as known?

And for the second part, I'm planning to include SRS features @markvdb pointed out in comments, combining both contextual learning with SRS would be interested I guess.

wahnfrieden

I built a popular integrated reader and SRS (with Anki integration as an alternative option) similar to LingQ but focused on Japanese currently

https://reader.manabi.io

claylimo

Similar to LingQ there is Migaku which can do this for YouTube and other sites. It definitely has significantly aided my learning and made it a zero friction and even fun experience to learn another language.

arbayi

Thank you for sharing! Looking at their blog, I saw this post about learning Japanese vocabulary (https://migaku.com/blog/japanese/how-to-learn-japanese-vocab...). They share a Japanese Netflix Frequency List - (https://docs.google.com/spreadsheets/d/15b3j9--RJ1K5hI9vz_2L...)

"To recognize 99% of all the words in Netflix's subtitles, you'd need to know 37,247 words"

Interesting approach! I really don't know how they managed to gather this list, but it's an interesting and clever method.

rapidpolyglot

You should try Lingopie.com, Its like the Netflix of Lanuguage learning. It's way more developed & Polished than Migaku or Language Reactor ... Migaku is only a chrome extension and Lingopie has thousands of TV shows and movies & also works with Netflix & Disney plus. Plus it has tons of language learning features and tools that Migaku has. You can also watch and learn on iOS & Android plus on your TV with their Smart TV apps.

null

[deleted]

Alex-Programs

There's also https://nuenki.app (disclaimer: I made it), which applies the same approach to every single website*. It translates appropriate-difficulty sentences into your target language, and you can hover for definitions, pronunciations, etc.

*other than those blocked for privacy reasons

arbayi

I actually want to learn German, but I want to learn it by reading German texts and starting from zero, even though that makes it challenging. I need to look up definitions and such, but translating the entire page defeats the core purpose. This app in my case is just perfect match! Thank you for sharing!

dpig_

Any plans to add Hindi, being the third-most spoken language in the world?

davidzweig

I'll drop this here: If anyone wants to work on Language Reactor (well compensated), my email is in my profile. I'm planning to start open-sourcing much of it soon.

jkoff

I did a double-take at the description since for a second I thought we were building the same tool. This is really cool, and seems like it'd greatly expand the set of podcasts I can listen to.

What I'm working on is different but similarly aimed at breaking through the intermediate plateau. I'm generating comprehensible input in podcast form, targeting the vocabulary used to fit a specific learning goal (e.g. "I want to be able to watch show X without subtitles") and systematically repeating the words at specific intervals to improve retention.

It works well as a prototype. I've listened to it for ~16 hours so far and it does seem to help me with vocabulary acquisition.

I'm still gauging whether I should polish and release it as a product, and would love some feedback and/or sign-ups:

https://letmeknow.jkoff.ca/infinite-ci?utm_source=hn

learning-tr

Wow this is super cool but how do you ensure the content is useful and correct?

jkoff

To be clear, I haven't shared this with anyone because I'm not yet sure that the content is useful and correct.

As far as where I'm at: - I've listened to it in my target language for N hours. To my ear, it sounds correct and I've learned some new words that I then heard used consistently in native media. - Next, I'd like to set it to teach me a language that I already know, so that I can more reliably and easily spot errors. This will require some changes, since my target language is currently hardcoded. - Longer-term, validation based on languages I speak can't generalize 100% to other languages, nor can validation of version N make assertions about version N + 1. Correctness would benefit from native speakers periodically checking results, and usefulness would benefit from user feedback (even if only in the form of engagement or lack thereof).

arbayi

Thank you so much to everyone contributing to this thread! I learned a lot here just by sharing this - the power of open source, I guess! From all those conversations and recommendations, I've gathered a list of features that will hopefully be built in next coming days:

- A more user-friendly approach to running the app

- LiteLLM integration so we can use any LLM (it's done! thanks to @enessusan00!)

- Running the database locally

- Customizable language preferences (e.g., learning German through Turkish)

- A live version where anyone can easily try the app

- A protection mechanism for LLM responses to ensure getting valid JSON

- Fixing small bugs

- Customizable exercise types (ability to enable/disable specific question formats)

We'll be focusing on improving the app as much as we can, but help would be greatly appreciated! We'll be structuring the repository to make it easier for everyone to contribute together.

I'm truly amazed by all the insights and suggestions shared here. There are so many great ideas. Thank you all again for making this discussion so enriching and the support. I'll keep sharing updates here! All amazing suggestions shared here will be added to the roadmap in the README!

arbayi

Here's a list of all the apps (built by them!) mentioned in comments:

- LangTurbo (by @sebnun) - langturbo.com : Learn through podcasts with transcriptions and contextual word definitions

- Nuenki (by @Alex-Programs) - nuenki.app : Browser extension that translates appropriate-difficulty sentences across websites, with hover-for-definitions feature

- Manabi Reader (by @wahnfrieden) - reader.manabi.io : Japanese-focused integrated reader with SRS and Anki integration

- (by @muth02446) - Spanish: appicenter.net/Apps/VocabES/ - English: appicenter.net/Apps/VocabEN/ : Uses spaced repetition and audio for basic vocabulary learning

- Vocabuo (by @kebsup) - vocabuo.com : Combines SRS flashcards with ebook/YouTube/website reader, using AI for content generation

- LingoStories (by @laurentlb) - github.com/laurentlb/lingostories/ : Open-source language learning tool

- Turkish Learning Tool (by @learning-tr) : Browser extension for colloquial translations with audio and pronunciation features

- Language Reactor (by @davidzweig) : Planning to open-source soon, looking for contributors

Note: above list is summarized by Claude 3.5 Sonnet.

laurentlb

Interesting approach! Thanks for making it open-source, I think we need more open-language language learning tools. As I'm also building one (https://github.com/laurentlb/lingostories/), I'm going to take a look at what you did and the technical decisions.

You seem to focus on the English use-case. In my experience, getting exposure to other languages can be much more difficult, especially when you're not fluent yet. It would be interesting to see how to approach it: ideally, questions and answers should be in the target language, but the questions have to be very simple.

As someone else mentioned, having audio would be very useful. At some point, you could consider a hand-free mode: it reads the question out loud, pauses a few seconds, then tells the response.

nikkwong

This looks neat. If you’re going to add Duolingo style features, please don’t add fill-in-the-blank or word matching to the question types; or at least make them optional. They are an incredibly frustrating waste of time on Duolingo—they take up a ton of time to solve and don’t actually improve comprehension. My biggest gripe with Duolingo is that half of the questions asked in a lesson are questions like these which have the pretense of helping you learn but don’t actually deliver. I think if you instead came up with some very difficult question types that really challenged someone’s comprehension, it would be stickier than Duolingo (especially for the HN crowd who is actually trying to learn) and not just here to “play a game” like a large portion of the Duolingo audience.

arbayi

Thank you! I am very interested in this project and want to keep working on it, hopefully getting help from open source contributors.

I actually had this idea of using Duolingo's style exercises, but now with your comment, I realize some might not be appropriate for individual learners with different goals.

The cool thing would be to have customizable exercise types, where users can choose which ones they want and which ones they don't want!

I will add this to the roadmap in the README, pointing out this comment! Thanks again!

jghn

Out of curiosity, do have any citations on how those exercises don't enable learning?

On the latter part, there used to be a hard mode at least in browser mode where you could have it force you hand type every word. I always really liked that, but then they got rid of it. Of course with the heart system these days, I wouldn't last 5 minutes if I tried to do it that way so such is life I suppose

StevigeHoning

The hand type option is available on browser for me.

German(Dutch) Italian(English)

kebsup

Great app! I've been building something similar, but for less advanced language learners, who wouldn't understand definitions in their target language.

My app [1] is basically a combination of SRS flashcards with an ebook/YouTube/Website reader. Unlike Anki though, AI creates example sentences, definitions, images and audio.

I find it interesting that you want to get inspired by Duolingo. My approach is to have the most efficient grind possible - no gamification. I've found Duolingo was wasting so much of my time with exercises that did not really teach me anything and took a long time to complete + the XP points/levels etc. were quite distracting.

[1] https://vocabuo.com

kalido

With all the (somewhat competing, though aimed and monetized differently) products in this thread, are there any promotions in place for extensive testing and comparison?

(E.g., your vocabuo website prominently points to possible promo codes.)

muth02446

Made something similar for Basic Vocabulary (Spanish and English so far):

* https://www.appicenter.net/Apps/VocabES/ * https://www.appicenter.net/Apps/VocabEN/

Uses Spaced Repetition and Audio but is not personalized which is less of an issue for basic words. The hard part is getting good example sentences and "cross links". I had though about use AI for that but have not followed through.

markvdb

Thank you for your contribution to the FOSS learning space.

Here's a few random suggestions: - spaced repetition. Again, anki style. - audio. Can you make it easy to record a phrase, anki style? Or maybe even make AI pronounce them correctly?

I would something like that.

arbayi

Thank you so much! I will definitely add those ideas to the roadmap in the README (pointing out this comment).

I believe the spaced repetition feature must be prioritized because that's the most important thing in this app. I mean, what's the purpose of seeing the words over and over again if I already have confidence with them?

For the pronunciation feature, I had similar work before and there are great open source tools and libraries we can build upon that analyze your pronunciation and spot where you made mistakes. We can use open source TTS libraries to pronounce the correct version.

I also would definitely want to see audio questions in exercises similar to Duolingo, and it would be great to work on those features.

learning-tr

I am learning Turkish so I built something like that for me. You can highlight any word online and it will translate colloquially so you can actually use it irl.

It also has audio and pronunciation. It is around the halfway mark in the demo.

demo: https://imgur.com/a/full-demo-so-far-O2fzBJn

arbayi

Harika görünüyor! Are you planning to share it open source?

Not: Bu benim en sevdiğim şarkılardan biridir!! (it's one my fav songs)

learning-tr

Teşekkür ederim! Evet, inş.(Thanks. Yes I will opensource it.)

Ben de aşık ya! (Mine too!)

flemhans

Is there a "Duolingo" that takes a web site as input and makes it into a course? So I could learn by reading e.g. a geeky news site in the language to be learned.

arbayi

Would be cool to have such a tool, maybe an extension using chrome's built-in AI APIs?

groggo

As someone who's learning a language (french) with Duolingo, and also supplementing that with other methods (podcasts, social media, online chatting, talking to chatgpt) I've also really wanted a way to get duolingo type experience with my own set of vocabulary that I encounter. So i'll definitely check this out. Also your english is impressive!