Computer Science Faculty Lead Misinformation Panel

By Tiffany Whitfield and Muntabir Choudhury��

Can you spot fake news on social media? Is the internet easily accessible for people with visual impairments or disabilities? Have you wondered how reliable digital preservation is or what about data mining impacts your web experience? These are only a few of the topics covered by 51�鱨վ computer science faculty at a recent panel, “How are Misinformation and Disinformation Related to You?” The lively panel was held on February 23, 2023 at the Perry Library and as a hybrid event with viewers from all over the world who tuned in.

The 51�鱨վ Computer Science panelists included six faculty with a wide range of expertise who included:

Assistant Professor Jian Wu, Ph.D., is an expert in natural language processing and understanding.

Assistant Professor Sampath Jayarathna, Ph.D., has expertise in data science and neuro-information retrieval.

Professor Michele Weigle, Ph.D., studies web science, digital preservation and social media.

Lecturer Faryaneh Poursardar, Ph.D., does research on computer human interaction.

Assistant Professor Vikas Ashok, Ph.D., is an expert in accessible computing.

Assistant Professor Yi He, Ph.D., has expertise in data mining and machine learning.

51�鱨վ students and those from around the globe took part in this one-of-a-kind discussion.

Research on Fake News

The first presenter was, Assistant Professor Wu, and he talked about spotting fake news and what can be done to stop the spread of misinformation and disinformation. “Fake news has been broadly defined as deceptive content presented under the disguise of legitimate journalism so they can use, it has been a worldwide information accuracy and integrity problem initially on social media such as Facebook and Twitter, and now has been found on mainstream platforms,” said Wu. There are seven types of classifications Wu has identified when it comes to fake news including “satire or parody, misleading content, imposter content, fabricated content, pause connection, both contexts and manipulated contents.”�� As the digital world continues to expand, studies have found that social media is the catalyst for the spread of fake news and the rate of speed at which it is disseminated is alarming as well. To combat the spread of fake news, Wu has been studying various ways to slow the spread. “You can have computers to identify potentially fake stories, and then double-check by humans,” said Wu. According to Wu, there are some simple steps people can do to spot fake news before spreading it. “First is to check the date and make sure you don’t share something that is old news from previous years before made to look like it just happened,” said Wu. Also, Wu encourages web users to check the source of the news and verify it is not disguising itself as an authoritative media. The third way is to do the reverse image search. “If you go to Google image search, you can check where this image originally came from and if the image has been misused in other contexts,” said Wu.

Assistant Professor Jian Wu, Ph.D.��

��

Research Experiences for Undergraduates (REU) program on Disinformation Detection and Analytics

The next presenter, Associate Professor Jayantha enlightened students about the importance of research and the various opportunities in his lab and other 51�鱨վ computer science labs. His lab has secured a National Science Foundation (NSF) fund to do research experience for undergraduates. “I'm the Director of disinformation detection and analytics program here at 51�鱨վ,” said Jayantha. “To qualify to be in the next cohort NSF research summer experience, you need to be a U.S. citizen, national or permanent and you must be a student enrolled in college right now,” said Jayantha. Sophomores, juniors, or seniors are strongly encouraged to apply. There were eight students from 51�鱨վ, University of California at Berkley, University of Virginia, Norfolk State University and Christopher Newport University who were part of the 2022 summer cohort. “We did and will do one-on-one matching with students, and that means you are coming to our cohort focusing on a certain topic, then you're going to be engaging with faculty throughout that summer attendings,” said Jayantha. “We highly encourage women, underrepresented minorities and also students with a disability to apply for the summer of 2024 because this summer’s registration has closed.” The projects that the summer 2022 cohorts worked on were either published or pending submissions in journals and others went on to give talks at conferences which are pending reviews at this point. Jayantha is working to be able to extend the summer undergraduate research program and get more funding.

Assistant Professor Sampath Jayarathna, Ph.D.��

��

Detecting Review Manipulations

51�鱨վ Computer Science lecturer Poursardar talked about how web users consider reviews as important on social platforms and how they can also be the focus of cyberattacks and manipulation. Poursardar asked the audience if they had “read any reviews in any other online platforms.” A show of hands from nearly everyone in the audience was raised, acknowledging that most people have read reviews. “So you know that these days, whatever we want to do, we go and read reviews, right?,” said Poursardar. “It doesn't matter if you want to buy an item on Amazon or anywhere else, or even if you want to download an app on your phone.”�� The importance of reviews is found on nearly every online platform. The amount of attention that reviews have on consumers has also caused an opening for scammers. “Reviews are vulnerable to manipulation,” said Poursardar. Some companies will pay people to write reviews. “It’s very important for us (as consumers) to uncover these kinds of fraudulent reviews.” Most buyers or consumers read reviews before purchasing an item and 93% of those users make purchases on the internet. “Fake internet reviews have a $152-billion direct impact on worldwide online purchases,” said Poursardar. Fraudulent reviews have certain characteristics such as “short reviews, unverified purchases, and grammatical errors. During the summer REU with Associate Professors Sampath and Wu, Poursardar worked with undergraduates to use models to identify fake news along with machine learning algorithms to put a halt to fraudulent reviews. The students and mentors gathered data and meticulously identified various subsets from the data and were able to extract data to make the algorithms better during the process. In conclusion, Poursardar’s tips for consumers who purchase items on the web are to be informative about reviews and verify them. Also, with more time and technological advances, she believes deep learning models will be able to combat fraudulent reviews.

Lecturer Faryaneh Poursardar, Ph.D.

��

Exploring Banned Instagram Accounts using Web Archives

Next to talk was 51�鱨վ Computer Science Professor Michele Weigle. She talked about the challenges in studying banned Instagram accounts using web archives. Both a doctoral student at 51�鱨վ and an undergraduate student at CNU, as part of the summer 2022 REU, worked on this research on the social media giant. “In the investigation into the 2016 US presidential election, it was discovered that the Russian Internet Research Agency, or IRA, used social media to manipulate our opinions during the election,” said Weigle. This was not a small operation. Senate hearings were held in 2017, and an analysis was commissioned by the U.S. Select Committee on Intelligence. “And what they found was that the IRA employed over 1,000 people with a budget of over $25 million to influence us (Americans) to effect the outcome of the 2016 presidential election,” said Weigle. “People knew that there were fake posts on Facebook and fake posts in Twitter, but one of the surprising things that they found was that a lot of the manipulation was being done on Instagram.” Many of the accounts spreading disinformation were banned by Instagram. Once an account is banned, it is no longer available on the live web, so researchers must web archives, such as the Internet Archive’s Wayback Machine, to study such content. Weigle and her students studied the archived Instagram account pages of the “Disinformation Dozen.” Unfortunately, they found that most of the archived pages, or mementos, redirected to the Instagram login page, so the content the users had posted was not available for study. Going forward, Weigle and her students will be studying when changes in the Instagram UI began to hinder archiving and investigation ways to improve the archiving of Instagram. “Much of the focus in trying to protect against disinformation, is really media literacy and inoculating the public against disinformation and misinformation,” said Weigle. Disinformation topics will continue to change, and refuting point-by-point each false claim can be difficult to dismantle. “The truth is much slower to spread than a falsehood, so the important thing is to educate people about what are the tactics that people who are spreading disinformation are using so that we can be aware,” said Weigle. Web archives will serve an important role in helping researchers determine the tactors of spreaders of disinformation, especially for suspended or banned accounts.

Professor Michele Weigle, Ph.D.

��

Can Blind People Easily Identify Deceptive Web Content with Present Assistive Technologies?

Next to present was Assistant Professor Vikas Ashok, Ph.D., is an expert in accessible computing. He began his presentation with how deceptive content can affect blind people. “Generally, the definition of deception or misinformation is slightly different for blind people when comparing it to sighted people,” said Ashok. The problem arises when the way blind people interact with computers and web pages differs from that of sighted people. During the presentation, Ashok described the importance of this problem. “For example, there are one million blind people in the US and 49.1 million worldwide; so, for these blind people to interact with web pages, they use ''screen readers,” said Ashok. With screen readers, the content on web pages will be read aloud using a synthesized voice. Blind users can hear through speakers or headphones and navigate the content using keyboard shortcuts. Although screen readers have some advantages, accessibility, and usability are still significant problems. For example, when interacting with web pages, blind people may experience poor web structure since many websites have been designed for sighted interaction, keyboard-only navigation, inaccessible images, unclear link text, lack of feedback, tedious and frustrating content navigation, and misleading or deceptive content. Ashok implied that among these problems, misleading or deceptive content problems on web pages received less attention. To describe this problem, Ashok played a demo video that showed that blind people can easily get deceived because they cannot see and only hear the contents. This is something that sighted people can easily avoid. For example, a web page may contain irrelevant advertisements or malware, which they can easily avoid. However, blind people have to listen to everything about the ad to determine whether it is irrelevant content. Sometimes they can easily be deceived since they use keyboard shortcuts to click the content and do not hear it is a virus or malware. They can easily download and install viruses on the computer. Many deceptive contents can be found on web pages, including fraudulent online advertisements, phishing websites, social media posts with false news, and clickbait and promotions. Ashok provided many examples to emphasize the problem as it is significant and needs more attention for further research. Lastly, he introduced assistive technologies using AI techniques to identify misleading content, discussed the challenges, and concluded his talks with a few remarks.

Assistant Professor Vikas Ashok, Ph.D.

��

Will Hallucination in ChatGPT pollute Public Knowledgebase?

Assistant Professor Yi He, Ph.D., has expertise in data mining and machine learning. Assistant Professor He gave a talk on ChatGPT, which may pollute public knowledge bases. He introduced around 60 years of history about chatbots, but the question is, what is the difference between these chatbots and ChatGPT? It may have gotten popular because of the famous GPT (Generative Pretrained Transformer) model in artificial intelligence (A.I.). He explored more about ChatGPT while running a new initiative for security to deploy a ChatGPT competitor on their welfare informative forum for the customers to interact with chatbots. “ChatGPT is A.I.-powered, and how it learns is from the data without expert help,” said He. It conveys multiple examples, capabilities, and limitations. “ChatGPT is trained on a 570 GB corpus of text data that includes existing literature, websites, Wikipedia, and all the online forums.” It utilizes the GPT transformer with 175 billion parameters for training. Enabling such training costs OpenAI about 12 million dollars to use resources. “If we want to build a cost-effective A.I. system, we would be hesitant to make such a system as a researcher or scientist with a limited budget.” ChatGPT can perform multi-round dialogue and information retrieval, document summarization, multilingual translation, writing essays, and coding. ChatGPT will also provide a contextual dialogue while asking a question in a more formatted version, and the version (GPT-3) will not provide answers to any malicious inputs. One of the ethical issues is that asking ChatGPT to write codes and essays will provide output that may harm academic integrity since students will tend to copy essays and codes without actually learning them. “Although ChatGPT has these remarkable capabilities, to criticize whether it will pollute public knowledgebase,” said He. Also, he explained the hallucination of ChatGPT mentioning that ChatGPT-generated text from language models is “nonsensical or unfaithful to ground truth.” For example, while asking ChatGPT to provide code for decision trees for classification using python from scratch without using any existing libraries, ChatGPT provided the code. Still, it failed to compile while running it in the compiler. “So as a student, if we are not careful, we will most likely be in trouble for copying the code in the assignments without understanding it correctly.” He provided some examples of hallucination, such as when asking ChatGPT to balance a chemical equation, ChatGPT provided the answer with steps. However, it made mistakes in balancing the chemical equation. Even asking to provide references for academic papers in a medical domain, ChatGPT produced fake references. “These hallucinations happen because the transformer model remembers what it has learned and produces the result by calculating the probability of the most possible combination of the words.” So, it��has a low probability of logical reasoning��on domain knowledge and no access to the current research frontier. Moreover, ChatGPT is learning the contents from the internet. Then the question is, what is the probability that the answers it produces from the internet are not misinformation or fabricated? Even ChatGPT itself creates fabricated answers which leads to polluting the knowledgebase. To overcome these problems, He mentioned understanding the “distinction between human-generated or AI-generated text.” To conclude the talk, He further suggested “we should not rely on the outputs of chatbots and actively searching for credible data sources to prevent misinformation.”

51�鱨վ

Resources

Legal

Visit