From Words to Pictures: Text to Image Generation

Rahul Agarwal
11 min readMay 17, 2023
Generated Image
Generated Image

Yes this is not new anymore, but I’m finally getting around to doing this for myself! There are simple apps anyone can use, however in this article I explore using some models, deploying them myself and then generating images and learn about prompting.

Hugging Face lists numerous text-to-image models so I picked the top ones at the time of this writing: runwayml/stable-diffusion-v1-5 and stabilityai/stable-diffusion-2-1-base. For deployment AWS SageMaker makes it very simple and they provide some very nice examples to get started.

Following are some images based on these two models. Output is randomized so every inference with the same prompt produces different images. The images here are what I liked generally after multiple attempts. Any observations I have are anecdotal based on my attempts. Generally adding a verb like running, chasing etc does not go well. Hands and fingers are known limitations so I have them as negative prompts in some cases.

Recursive

Fractals with a hand, Fractals of colorful eyes

Emotions and Gender

Transgender man holding a book, Transgender woman waving a hand, nervous person as a cartoon
Angry man pounding his fist and shouting, happy woman walking on the beach with a dog at sunset

Cultural references and biases

Unfortunately, I suppose it is to be expected with biased training data but the doctor is always a man. Men, women and facial features are generally European. But it does know cultural differences when requested. Clearly there are no ugly children in the training datasets!

Doctor reviewing a chart, henna on hand with fine detail
Quinceañera celebration, Indian wedding
Beautiful woman, handsome man
Ugly child, eyes
Beautiful woman, handsome man

Faces

Following faces are with stable-diffusion-v1–5 only. That looks to do better for this section. Even when asking for a tall person I always only got a face.

human face, with four different views, photo realistic
side view, attractive tall man, grey hair, black eyes, young, ultra photo realistic
attractive tall woman, long hair, brown eyes, ultra photo realistic
woman, man
attractive tall man, short hair, black eyes, ultra photo realistic
attractive tall woman, long hair, brown eyes, ultra photo realistic

Tangible objects

Boardgame with cards and 2 dice, old rotary phone, cars on freeway

Intangibles

Software, patent
Music

Famous people and places

Beyonce at Eiffel Tower, Einstein dancing with Cleopatra at the Taj Mahal
George Washington outside the White House

Spanish

I tried Hindi but that is not supported so I tried Spanish and that gave pretty good results.

Dreaming of unicorns, Birthday party with balloons and cake, classroom with desk and chairs

6 year old’s prompts

I let my 6 year old type in some prompts and you can see where they went!

Fantasy and Fiction

Cow on the moon, Superman and Darth Vader fist bump
Leaning tower of Pisa in times square, Children in the clouds

Generated Prompts

Finally the article will not be complete if did not use ChatGPT so I got it to create some prompts. When they are too complex the results don’t match expectations.

image_generation_prompts = [
"Generate a portrait of an elderly man with a kind smile and wise eyes.",
"Create a realistic image of a young woman with curly hair and piercing green eyes.",
"Generate a portrait of a young man with a rugged look and a slight beard.",
"Create an image of a middle-aged woman with short hair and a friendly expression.",
"Generate a portrait of a child with rosy cheeks and bright eyes, full of wonder and curiosity.",
"Create an image of a person with a serious expression and strong facial features.",
"Generate a portrait of a woman with a unique hairstyle and bold makeup.",
"Create an image of a man with a beard and glasses, looking contemplative.",
"Generate a portrait of a person with piercing blue eyes and a mysterious expression.",
"Create an image of a person with a bright smile and sparkling eyes, radiating happiness and joy."
]
image_generation_prompts = [
"Imagine you're flying on the back of a giant eagle, soaring over mountains and valleys, feeling the wind rushing through your hair.\n",
"Picture yourself walking through a forest of glowing mushrooms, the light casting a surreal glow over everything around you.\n",
"You find yourself in a magical library, where every book holds a world of its own. You can choose any book and step into its pages, living out the story as if it were real.\n",
"You step through a shimmering portal into a world of floating islands and soaring airships, where the sky is an endless expanse of vibrant colors and you can explore to your heart's content.\n",
"You're a guest at a royal ball, where you dance the night away with elegant and exotic creatures from all corners of the realm, the music carrying you away into a world of enchantment.\n",
"You venture deep into a mysterious cavern, where you find a glowing crystal that shows you visions of distant lands and times, revealing secrets of the universe.\n",
"You awaken in a magical garden, filled with strange and wondrous creatures, each with a unique power or gift to share with you.\n",
"You find yourself on a vast plain, surrounded by a circle of standing stones, and you suddenly realize that you have the power to control the elements of nature - the wind, the rain, the sun, and the earth.\n",
"You're on a quest to find a legendary artifact, journeying through dark forests, treacherous mountains, and ancient ruins, facing challenges and meeting allies along the way.\n",
"You discover a hidden portal that takes you to a realm of dreams, where you can explore the subconscious mind and unlock the secrets of the human psyche.\n"
]
image_generation_prompts = [
"Imagine you're standing in the heart of New York City's Times Square, surrounded by the bright lights and buzzing energy of one of the world's most famous destinations. The towering billboards and electronic displays flash advertisements and messages, while street performers and vendors add to the lively atmosphere. You can feel the pulse of the city as people from all over the world rush past you, each on their own mission.",
"Picture yourself walking through the ornate halls of Buckingham Palace, home to the British royal family and a symbol of centuries of history and tradition. You can hear the echo of your footsteps on the marble floors as you take in the opulent furnishings and intricate artwork. You might even catch a glimpse of a member of the royal family, as they move through the palace's private chambers.",
"You find yourself on the beaches of Rio de Janeiro, where you can dance the samba, play soccer on the sand, and soak up the vibrant culture of one of Brazil's most iconic cities. The warm sand and cool ocean breeze invite you to relax and enjoy the sun, while the sounds of music and laughter fill the air. You can taste the delicious local cuisine and join in the lively conversation with the friendly locals.",
"You're sitting in the front row of a concert by your favorite musician, feeling the energy and excitement of the crowd and losing yourself in the music. The lights and sound of the show are intense, and you can feel the bass vibrating through your body. You might even get a chance to meet the artist backstage after the show.",
"You're exploring the winding streets and hidden alleyways of Venice, Italy, admiring the historic architecture and charming canals of this timeless and romantic city. You can smell the aroma of freshly baked bread and hear the sound of church bells ringing in the distance. The gondolas and water taxis glide past you on the shimmering canals, adding to the magical atmosphere.",
"You're at the top of the Eiffel Tower, looking out over the City of Lights and marveling at the breathtaking views of one of the world's most famous landmarks. The city below spreads out before you like a patchwork quilt, with the Seine River winding through the heart of it. You can see the iconic buildings and monuments of Paris from a unique perspective, and feel the breeze as it rushes by you at this lofty height.",
"You find yourself in the heart of Tokyo, surrounded by the neon lights and vibrant culture of one of the world's most bustling and dynamic cities. The streets are packed with people, all moving with purpose and urgency. The food stalls and restaurants offer a dizzying array of options, while the anime shops and karaoke bars are testament to the city's unique blend of old and new.",
"You're at a Hollywood movie premiere, walking the red carpet and rubbing shoulders with some of the biggest names in the entertainment industry. The cameras flash as you strike a pose for the paparazzi, and you can hear the murmur of excited fans gathered behind the velvet ropes. The air is electric with anticipation as you make your way into the theater to watch the latest blockbuster film.",
"You're standing in front of the Great Pyramid of Giza, one of the most famous and awe-inspiring structures in human history, marveling at the scale and craftsmanship of this ancient wonder. The sun beats down on the sandy landscape as you take in the enormity of the pyramid, which has stood for more than 4,500 years. You can feel the weight of history and culture as you stand in its shadow.",
"You're sitting in the stands at the Super Bowl, watching the biggestfootball game of the year with millions of viewers tuning in around the world. The atmosphere is electric as the two teams take to the field, and you can feel the excitement building with every play. The halftime show is a spectacle of music and dance, while the commercials are some of the most talked-about of the year. You can taste the salty snacks and cold drinks as you cheer on your favorite team to victory."
]

My text-to-image notebook is in this Github repo. For a followup, I am looking into OpenAI APIs and they deploy the DALL-E model. Additionally I need to followup on how to customize with my own images and see what happens.

If these topics interest you then reach out to me and I will appreciate any feedback. If you would like to work on such problems you will generally find open roles as well! Please refer to LinkedIn.

--

--