Inbred, gibberish or just MAD? Warnings rise about AI models

The Fort Worth Press - Inbred, gibberish or just MAD? Warnings rise about AI models

Fort Worth 21°C

USD -

AED 3.6725

AFN 63.515111

ALL 81.813592

AMD 370.642956

ANG 1.789884

AOA 918.000277

ARS 1402.006102

AUD 1.394758

AWG 1.8025

AZN 1.756157

BAM 1.673763

BBD 2.014848

BDT 122.744486

BGN 1.668102

BHD 0.378259

BIF 2976.953556

BMD 1

BND 1.277439

BOB 6.912222

BRL 4.950503

BSD 1.000406

BTN 95.268333

BWP 13.595091

BYN 2.832032

BYR 19600

BZD 2.011938

CAD 1.361515

CDF 2316.00032

CHF 0.784205

CLF 0.023145

CLP 910.940167

CNY 6.83025

CNH 6.830895

COP 3728.45

CRC 455.103656

CUC 1

CUP 26.5

CVE 94.363762

CZK 20.862003

DJF 178.141394

DKK 6.39453

DOP 59.605058

DZD 132.430977

EGP 53.742498

ERN 15

ETB 157.299296

EUR 0.855802

FJD 2.197403

FKP 0.738858

GBP 0.738825

GEL 2.68501

GGP 0.738858

GHS 11.214281

GIP 0.738858

GMD 73.503045

GNF 8779.444171

GTQ 7.636122

GYD 209.292176

HKD 7.83645

HNL 26.592098

HRK 6.447992

HTG 130.92574

HUF 310.449499

IDR 17455

ILS 2.943045

IMP 0.738858

INR 95.186798

IQD 1310.455489

IRR 1315000.000414

ISK 122.710279

JEP 0.738858

JMD 157.422027

JOD 0.709038

JPY 157.799034

KES 129.169806

KGS 87.420498

KHR 4012.802629

KMF 420.494418

KPW 900.003193

KRW 1473.449864

KWD 0.30815

KYD 0.833626

KZT 464.848397

LAK 21968.14747

LBP 89583.7434

LKR 320.121521

LRD 183.567107

LSL 16.741448

LTL 2.95274

LVL 0.60489

LYD 6.346517

MAD 9.245917

MDL 17.266433

MGA 4166.844956

MKD 52.707418

MMK 2099.706641

MNT 3578.607048

MOP 8.074899

MRU 39.944374

MUR 46.949791

MVR 15.455016

MWK 1734.687765

MXN 17.44055

MYR 3.962499

MZN 63.910292

NAD 16.741734

NGN 1368.6098

NIO 36.815644

NOK 9.24674

NPR 152.429814

NZD 1.700835

OMR 0.384504

PAB 1.000419

PEN 3.507156

PGK 4.350003

PHP 61.663971

PKR 278.776321

PLN 3.64042

PYG 6061.565584

QAR 3.656451

RON 4.4665

RSD 100.453998

RUB 75.496787

RWF 1462.717478

SAR 3.752423

SBD 8.025868

SCR 13.359108

SDG 600.49739

SEK 9.27558

SGD 1.27714

SHP 0.746601

SLE 24.649919

SLL 20969.496166

SOS 571.753772

SRD 37.456007

STD 20697.981008

STN 20.966603

SVC 8.752915

SYP 110.530725

SZL 16.738482

THB 32.643975

TJS 9.353536

TMT 3.505

TND 2.916547

TOP 2.40776

TRY 45.216002

TTD 6.781199

TWD 31.609197

TZS 2602.500263

UAH 43.963252

UGX 3776.555915

UYU 40.282241

UZS 12039.109133

VES 488.94275

VND 26323

VUV 118.524529

WST 2.715931

XAF 561.361905

XAG 0.013565

XAU 0.000219

XCD 2.70255

XCG 1.802894

XDR 0.697635

XOF 561.361905

XPF 102.06029

YER 238.625025

ZAR 16.690498

ZMK 9001.204285

ZMW 18.882166

ZWL 321.999592

RBGPF

1.6000

64.7

+2.47%
RYCEF

-0.0200

16.33

-0.12%
CMSC

-0.0100

22.87

-0.04%
VOD

-0.1000

16.05

-0.62%
RIO

-1.9500

98.63

-1.98%
RELX

0.0100

36.36

+0.03%
JRI

-0.0500

12.93

-0.39%
NGG

-0.9800

87.5

-1.12%
BCE

-0.0300

23.93

-0.13%
BCC

-3.8000

74.33

-5.11%
CMSD

-0.0300

23.25

-0.13%
GSK

-0.7100

50.9

-1.39%
AZN

-1.2800

183.46

-0.7%
BP

0.5300

46.94

+1.13%
BTI

-0.3600

58.35

-0.62%

Inbred, gibberish or just MAD? Warnings rise about AI models

TECHNOLOGY 05.08.2024

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

W.Lane--TFWP

The Fort Worth Press - Inbred, gibberish or just MAD? Warnings rise about AI models

Inbred, gibberish or just MAD? Warnings rise about AI models

Featured

Unusual Machines Initiates $75M in Strategic Materials Purchases to Support Program-Driven Demand

Nextech3D.ai Integrates AI‑Enabled Hotel Booking with HotelPlanner Using Expedia and Priceline to Extend Event Technology Platform Monetization

Digi Power X Signs AI Colocation Agreement with Leading AI Compute Company for 40 MW Data Center in Columbiana, Alabama

Arrive AI to Report Q1 2026 Results and Host Webcast on May 15