Skip to main content

Generative AI: New Data Application with Old Data Problems

Our guest contributor this month is Christie Fuller, PhD., professor of information technology management at Boise State University’s College of Business and Economics. Read more about Professor Fuller.

In the two years since ChatGPT was released, it has grown to over 200 million weekly users according to OpenAI. This tool, along with others such as Google Gemini and Microsoft Copilot, can improve productivity, enhance creativity, provide learning opportunities and has many other capabilities that are already having a major impact on business and education. However; there are also downsides to its use. This includes the hidden or “black box” nature of how it produces results, the quality and sources of the data fed into these models and the accuracy of results. While the emergence of ChatGPT and related tools has brought these issues to the attention of the general public, these issues are not new to the world of data.

Underlying the large language models (LLMs) of ChatGPT are neural networks. Neural networks were first conceived of by computer scientists in the 1940s, decades before technology allowed them to be practically implemented. Because they are designed to mimic the complex ability of humans to solve problems, they often provide more accurate results than simpler algorithms. However, how the decisions are made and the connections between the inputs and the outputs are opaque in nature. In 2016, Cathy O’Neil published the book, Weapons of Math Destruction, which highlighted this and identified potential problems with the use of black box algorithms in areas such as personnel decisions, policing, crime sentencing and college admissions. In the recent past, neural networks may have only had hundreds of weights within the black box. In Co-Intelligence: Living and Working with AI, Ethan Mollick, states that the original ChatGPT had 175 billion weights, further obscuring how what goes into the algorithm is related to what comes out.

O’Neil also describes how these algorithms may proliferate bias that exists in the data used to build models used for decision making. For example, she suggests that a widely used model intended to predict recidivism actually increases recidivism. In the past, the data used as inputs to algorithms would typically be clear. However; today’s models feed in vast amounts of data of varying quality and the sources of the data are largely opaque, as pointed out by Mollick. It appears that generative AI tools continue to perpetuate race and gender stereotypes (see Bloomberg article on Humans Are Biased. Generative AI Is Even Worse). There is also concern that these models are using the intellectual property of others without permission . In September 2023, several authors filed suit against OpenAI for copyright infringement (read more about the suit here.)

A key part of working with data has always been the decision making and critical analysis of results by a human. The data analyst must carefully select and format the inputs into a model, make decisions about how the model is implemented and carefully assess whether the results are both legitimate and useful. Mollick points out that traditional models are also predictable and reliable. While AI tools, such as ChatGPT, hold the potential to make data analysis accessible to nearly everyone regardless of whether they have data training, accessibility cannot be confused with accuracy. These tools are known to hallucinate, such as the recent example where a lawyer presented a brief to the court citing over half a dozen decisions invented by ChatGPT (read The New York Times story, “Here’s What Happens When Your Lawyer Uses ChatGPT”). When users without prior data analysis knowledge and subject matter expertise use opaque, unpredictable models, this can lead to the implementation of solutions based on faulty problem solving.

AI is already changing our world and will continue to do so for the foreseeable future. If properly used, it can have great benefits to society, such as advancing healthcare and expanding access to education. However, as its use spreads, we must be aware of the ways that it repeats and exaggerates long standing data issues.

Have any questions or want to know more about Generative AI? Please reach out to Prof. Fuller at christiefuller@boisestate.edu or COBE Ethics Chair at COBEEthics@boisestate.edu. Learn more and explore the programs offered by Boise State’s College of Business and Economics (COBE).