bitcoin, ethereum, litecoin, btc, eth, ltc, lunc, terra classic Binance
AI Chatbot ChatGPT's Dwindling Performance
Researchers Lingjiao Chen, Matei Zaharia, and James Zou undertook rigorous tests on two models - ChatGPT-3.5 and ChatGPT-4. The AI was evaluated on varied parameters like mathematical problem-solving, generating lines of fresh code, and handling sensitive prompts, among others.
In an interesting twist, the study found that GPT-4, initially boasting a 97.6% accuracy rate in prime number identification in March, saw a massive drop to a measly 2.4% by June. Astonishingly, its predecessor, GPT-3.5, showcased improvement in the same task during this period.
There was a noted decline in both models' abilities when it came to generating novel code lines between March and June. Additionally, their handling of sensitive questions underwent a transformation. The bots, which earlier elaborated on their inability to answer certain sensitive queries related to ethnicity and gender, adopted a curt approach by June, merely apologizing and refusing to entertain such questions.
The researchers highlighted that "The behavior of the 'same' [large language model] service can change significantly within a relatively brief period." They stressed the urgency for continuous oversight of AI model quality.
For those who extensively utilize these LLM services, whether individuals or corporations, the researchers suggested a constant monitoring framework to ensure quality consistency.
AI technologies are akin to a roller coaster ride, with thrilling peaks and surprising lows. As AI models continue to evolve, issues like these present critical opportunities to understand, address and build even more reliable systems. It's all part of the ride.