a music model does exactly what you ask, which is the problem. ask for "cool" and you get someone else's idea of cool. the gap isn't the model, it's that you can't name the thing in your head. so we go aspect by aspect. you hear two versions back to back, you feel the difference, you learn the word for it. then you watch that word go into a prompt and change the output. hear it, name it, prompt it.
most people write bad music prompts because they can't hear what they want, so they type "make it cool". TYPE / TEACHER trains your ear one aspect at a time, then hands you the words to steer a model. hear it, name it, prompt it.
Why this exists
three blocks per aspect. hear it plays two or more versions where one thing changes and everything else is frozen, so your ear can only blame that one thing. name it gives you the words producers actually use. prompt it shows a weak prompt rewritten with those words. it all runs on the same synth the instrument uses, client side, no signup.
The aspects
The modules
In a real song
the aspects don't live in a vacuum, they're all in every song at once. pick a track you know and see how each one shows up, in the same words the lessons taught. the audio is a synthetic sketch, not the record, so you hear the feel without the copyright.
pick a module, hear the difference, then go write a prompt that actually means something. the words you just learned are what a music model listens for.