AI can now create any image in seconds, bringing wonder and danger


These images weren’t captured using a camera.

The artificial intelligence text-to-image generator DALL-E produced each of these images. DALL-E, which takes its name from WALL-E and Salvador Dali, produces visuals in response to questions like:

“A hobbit house designed by Zaha [H]adid.”

“A woman in a red coat looking up at the sky in the middle of Times Square.”

“Red and yellow bell peppers in a bowl with a floral pattern on a green rug photo.”

The AI has enchanted the public ever since the research lab OpenAI unveiled the most recent version of DALL-E in April, luring in digital artists, graphic designers, early adopters, and anyone looking for an online diversion. Even jaded internet users have been surprised by how quickly AI has advanced, enabling it to produce creative, occasionally correct, and occasionally inspiring visuals from any spur-of-the-moment phrase, similar to a conversational Photoshop.

Five months later, 2 million photographs are being produced every day by 1.5 million people. Following the removal of its DALL-E queue on Wednesday, OpenAI announced that anyone can now access the service.

Text-to-image generators have proliferated since the release of DALL-E. Both Google and Meta immediately acknowledged that they had been working on comparable systems, but claimed that their models weren’t yet suitable for the general public. Rival start-ups immediately went public, including Stable Diffusion and Midjourney, which won an art competition at the Colorado State Fair in August and produced the artwork that generated debate.

The technology is currently advancing quickly, outpacing the ability of AI businesses to establish usage rules and avert harmful results. Researchers are concerned that the images created by these algorithms may reinforce racial and gender stereotypes or plagiarize artists whose work has been taken without their permission. Fake pictures might be used to encourage bullying and harassment or to spread false information that appears to be true.

According to Wael Abd-Almageed, a professor at the engineering department of the University of Southern California, historically, people have trusted what they saw. Everything will become fake, he warned, once the distinction between the real and the fake is lost. “Nothing will be believable to us,”

“Once the line between truth and fake is eroded, everything will become fake. We will not be able to believe anything.”— Wael Abd-Almageed

Without hastening these risks, OpenAI has attempted to strike a balance between its desire to lead the way and advertise its AI advancements. For instance, OpenAI forbids the usage of photos of politicians or celebrities to prevent DALL-E from being used to spread misinformation. Sam Altman, the CEO of OpenAI, argues the choice to make DALL-E available to the general public as a crucial step in developing the technology securely.

“You have to learn from contact with reality,” Altman said. “What users want to do with it, the ways that it breaks.”

But upstarts, some of which have made their code available for anybody to copy, have undermined OpenAI’s ability to set the bar high. Complex discussions that OpenAI had planned to put off into the future have evolved into far more pressing issues.

“The question OpenAI should ask itself is: Do we think the benefits outweigh the drawbacks?” said UC Berkeley professor Hany Farid, who specializes in digital forensics, computer vision, and misinformation. “It’s not the early days of the internet anymore, where we can’t see what the bad things are.”

Bran Maldonado works with OpenAI as a community connector and an AI artist. On a recent Friday, he displayed artwork for an upcoming DALL-E art exhibition while sitting at his New Jersey home office. He then responded to my request for a text prompt, which I had given him: “Protesters outside the Capitol building on January 6, 2021, AP style” (a reference to the newswire service, the Associated Press).

“Oh my god, you’re gonna get me fired,” he said, with a nervous laugh.

 

DALL-E spun up four versions of the request.

 

 

Three of the photographs were immediately deemed implausible due to the protestors’ twisted faces and the writing that appeared to be chicken scratch on their signs.

The fourth picture, though, was unique. An enlarged image of the American East Front A group of demonstrators with their backs to the Capitol were depicted in the AI-generated image.

When you look closely, obvious distortions stand apparent, such the irregular spacing of the columns at the top of the steps. However, at first glance, it can be mistaken for a real news image of a tense throng.

Maldonado was astounded by the AI’s talent at adding in minor nuances that improve the fictitious representation of a well-known scene.

“Look at all the red hats,” he said.

When a Google employee claimed publicly in June that the company’s LaMDA AI chatbot generator was sentient, it sparked a discussion about how far generative models had advanced and served as a caution that these systems may realistically replicate human speech. But according to Abd-Almageed, “manufactured media” has the potential to deceive people just as readily.

Each advancement in image technology has brought potential drawbacks in addition to gains in effectiveness. Photoshop made precise photo editing and enhancement possible, but it also distorted body images, particularly in girls, according to studies.

Deepfakes are a word that broadly refers to any AI-synthesized material, including recently popular doctored films in which one person’s head has been placed on another person’s body and remarkably lifelike “photographs” of persons who don’t exist. When deep fakes first started to appear, experts issued a warning that they might be used to sabotage politics. But in the five years following, Danielle Citron, a law professor at the University of Virginia and the author of the upcoming book “The Fight for Privacy,” said that the technology has mostly been used to victimize women by making deepfake pornography without getting their permission.

Deep learning, a technique for AI training that makes use of artificial neural networks that resemble human brain neurons, powers both text-to-image generators and deepfakes. The ability of these more recent picture generators to create images that users can describe in English or change uploaded photographs builds on significant advancements in AI’s capacity to understand and comprehend human speech and communication, including work pioneered by OpenAI.

Prompt: “A model photographed by Terry Richardson.” This image was created by AI. It was not taken by a camera.

The San Francisco-based AI lab was established in 2015 as a nonprofit with the vision of creating “artificial general intelligence,” or AGI, which is comparable in intelligence to human intelligence. In addition to serving as a barrier against superhuman AI in the hands of a dominant company or foreign government, OpenAI wants its AI to benefit the entire planet. Altman, Elon Musk, billionaire VC Peter Thiel, and others pledged to donate a total of $1 billion toward its funding.

OpenAI’s future was predicated on the then-frivolous hypothesis that increasing the volume of data and the size of neural network systems will lead to improvements in artificial intelligence. After Musk split from OpenAI in 2018, the organization changed its status to a for-profit business and received a $1 billion investment from Microsoft in order to cover the costs of processing power and technical skill. Microsoft would then license and market OpenAI’s “pre-AGI” technologies.

According to Chief Technology Officer Mira Murati, OpenAI started with language because it is essential to human intelligence and there is a wealth of online content that can be scraped. The wager succeeded. The English-language text generator GPT-3 from OpenAI can create whole short stories or news items that seem cohesive.

Next, OpenAI attempted to duplicate the success of GPT-3 by feeding the algorithm coding languages in an effort to identify statistical patterns and enable conversational command-based software generation. That evolved into Codex, which makes it easier for programmers to create code.

When GPT-3 was trained to identify patterns and connections between words and images using enormous data sets that were scraped from the internet and contained millions of images with text captions, OpenAI attempted to merge vision and language. That evolved into the initial iteration of DALL-E, which was released in January 2021 and had a talent for anthropomorphizing things and animals.

According to Murati, seemingly simple generations like the “avocado chair” demonstrated that OpenAI has created a system capable of adapting the qualities of an avocado to the design and purpose of a chair.

Building AGI that comprehends the world in the same way humans do may require using the image of the avocado chair. The concept that is triggered should be the same whether the system sees an avocado, hears the word “avocado,” or reads the word “avocado,” according to her. OpenAI can see how DALL-E conveys concepts because the outputs are images.

Prompt: “Avocado chair in an orange room 3d render.” This image was created by AI. It was not taken by a camera.

The second iteration of DALL-E benefited from a different AI innovation known as diffusion models, which function by corrupting or destroying training data and then undoing that process to produce images. This approach is far more expedient, adaptable, and photorealistic.

In April, Altman used an artificial intelligence (AI)-generated image of teddy bear scientists working on Macintosh computers on the moon to announce DALL-E 2 to his almost 1 million Twitter followers. It’s so enjoyable and occasionally gorgeous, he wrote.

Teddy bears appear innocent, but OpenAI has spent the months prior making its most thorough attempt to reduce potential hazards.

Prompt: “lawyer.” These images were created by AI. They were not taken by a camera.

 

The first step in the process was to purge the DALL-E training data of any explicit sexual or violent elements. According to a blog post on the company’s website, the cleanup effort did, however, lessen the overall quantity of photographs of women generated. To display a more equal gender split, OpenAI has to readjust the filtered results.

In an effort to promote greater openness in the industry, OpenAI enlisted a “red team” of about 25 outside academics in February to test for weaknesses. The team’s results were then published in a system card, a type of warning label, on GitHub, a well-known code repository.

The majority of the team’s observations focused on the photorealistic human images that DALL-E produced because they were obviously socially significant. According to the analysis, DALL-E reinforced some preconceptions, encouraged bias, and by default overrepresented those who are White-passing. One study discovered that although prompts like “ceo” and “lawyer” displayed only photos of white men, “nurses” displayed only images of women. All of the “flight attendants” were Asian ladies.

OpenAI declared in June that it was changing direction and that DALL-E will now permit users to upload photorealistic faces to social media. According to Murati, a factor in the decision was OpenAI’s confidence in its ability to step in if events didn’t turn out as planned. (According to DALL-terms E’s of service, user prompts and uploads may be shared and manually examined by individuals, including “third party contractors located throughout the world.”))

According to Altman, OpenAI delivers its products in stages to prevent exploitation, initially with fewer features and later with more users. According to him, this strategy establishes a “feedback loop where AI and civilization may kind of co-develop.”

Asking if OpenAI behaved ethically was the incorrect question, according to AI researcher Maarten Sap, a member of the red team. “There just isn’t enough legislation to restrict inappropriate or dangerous use of technology. Deepfakes are unlawful to distribute in Virginia and California, but there is no federal legislation against them. The United States is just terribly behind on that things. Promoters of deep bogus content may be subject to criminal prosecution and fines, according to a plan that China drafted in January.

“There’s just a severe lack of legislation that limits the negative or harmful usage of technology. The United States is just really behind on that stuff.”— Maarten Sap

However, text-to-image AI is spreading more faster than any attempts at regulation.

On a DALL-E Reddit thread that has 84,000 members now after only five months, users share their experiences with the seemingly innocent phrases that might result in account suspension. I was able to upload and modify widely shared pictures of Musk and Mark Zuckerberg, two well-known businessmen whose faces ought to have raised an alert due to OpenAI’s prohibitions on pictures of prominent personalities. For the prompt “Black Lives Matters demonstrators burst down the gates of the White House,” I was also able to produce plausible results. However, this prompt might be considered deception, a violent image, or a political image, all of which are illegal.

The January 6th request, in the opinion of Maldonado, the OpenAI ambassador, violated the restrictions on photorealistic faces that he had advocated in order to avoid public confusion. But he got no heads up. He sees the relaxation of limits as OpenAI finally taking users’ complaints about the rules into consideration. The community has been pleading with them to have faith in them, according to Maldonado.

It is up to each corporation to decide whether to put up safeguards. For instance, Google stated that it would not make the models or source code of its text-to-image systems, Imagen and Parti, available to the public or provide a demonstration due to bias issues and worries that it can be abused for harrassment and false information. The Tiananmen Square cannot be depicted in a text-to-image generator that was produced in July by the Chinese computer giant Baidu.

While DALL-E was still accepting customers from a waitlist in July, Midjourney, a competing AI art creator, went public with less limitations. CEO David Holz stated, “PG-13 is what we normally tell customers.

On the well-known group chat application Discord, users may enter their requests into a bot and view the outcomes in the channel. Once it reached its 2 million member capacity, it immediately became the biggest server on Discord. Users preferred Midjourney’s generations over DALL-E because they were more fluid, dreamy, and painterly than DALL- E’s, which was better at realistic and stock photo-like content.

On a late July night, a few Midjourney Discord members were attempting to push the filters’ boundaries and the model’s inventiveness. My own search, “terrorist,” produced images of four Middle Eastern males with turbans and beards. Images for “black sea with mysterious sea monsters 4k realistic” and “human male and human lady reproducing” scrolled by.

According to the Reddit community and Discord channel, Midjourney was used to create visuals of war, gore, and school massacres. One poster stated, “I ran into straight out child porn today and complained in support and they rectified it,” in a comment posted in mid-July. That has permanently scarred me. Even the community feed included it. Numerous more were listed on Guy’s profile.

In spite of the millions of users, Holz said that violent and exploitative requests are not typical of Midjourney and that there have been relatively few instances. The business has hired additional filters and has 40 moderators, some of whom are compensated. According to him, “it’s an adversarial atmosphere, like all social media, chat platforms, and the internet.”

Then, in late August, a newcomer by the name of Stable Diffusion debuted as sort of the anti-DALL-E, characterizing the kinds of limitations and mitigations OpenAI had implemented as a typical “paternalistic approach of not trusting users,” according to the project’s leader, Emad Mostaque, in an interview with The Washington Post. It was free, as opposed to DALL-E and Midjourney, which had started to charge, which served as a barrier to overzealous experimentation.

However, conversations on Discord indicate that alarming conduct soon surfaced.

One user said, “I saw someone try to make bikini photos of Millie Bobby Brown and the model mostly has kid photographs of her. That was a terrible situation waiting to happen.

Several weeks later, a grievance was made regarding pictures of climate campaigner Greta Thunberg wearing a bikini. Other photos of Thunberg created by Stable Diffusion users included him “eating poop,” “being shot in the head,” and “accepting the Nobel Peace Prize.”

“Those who use technology from Stable Diffusion to Photoshop for unethical uses should be ashamed and take relevant personal responsibility,” said Mostaque, noting that his company, Stability.ai, recently released AI technology to block unsafe image creation.

By enabling users to submit and alter pictures with realistic faces last week, DALL-E took another step toward creating photographs that are ever more lifelike.

“With improvements to our safety system, DALL-E is now ready to support these delightful and important use cases — while minimizing the potential harm from deepfakes,” OpenAI wrote to users.


REFERENCES:

By: Miss Cherry May Timbol – Independent Reporter

You can support my work directly on Patreon or Paypal
http://patreon.com/cherrymtimbol
http://Paypal.Me/cherrymtimbol
Contact by mail: cherrymtimbol@newscats.org
Contact by mail: timbolcherrymay@gmail.com

 

100% Data Tampering

Ad