AI systems are already deceiving us -- and that's a problem, experts warn

The Fort Worth Press - AI systems are already deceiving us -- and that's a problem, experts warn

Fort Worth 23°C

USD -

AED 3.6725

AFN 63.503502

ALL 81.990336

AMD 370.903715

ANG 1.789884

AOA 918.000197

ARS 1401.993023

AUD 1.39913

AWG 1.8025

AZN 1.696504

BAM 1.67146

BBD 2.014355

BDT 122.739548

BGN 1.668102

BHD 0.377403

BIF 2975

BMD 1

BND 1.275858

BOB 6.936925

BRL 4.965705

BSD 1.000128

BTN 95.070143

BWP 13.576443

BYN 2.828953

BYR 19600

BZD 2.011854

CAD 1.36214

CDF 2315.999417

CHF 0.784106

CLF 0.023178

CLP 912.21986

CNY 6.83025

CNH 6.83533

COP 3728.45

CRC 454.739685

CUC 1

CUP 26.5

CVE 94.650328

CZK 20.87905

DJF 177.720468

DKK 6.39432

DOP 59.600085

DZD 132.411933

EGP 53.530803

ERN 15

ETB 157.075029

EUR 0.8557

FJD 2.202202

FKP 0.736222

GBP 0.739275

GEL 2.685011

GGP 0.736222

GHS 11.195005

GIP 0.736222

GMD 73.49532

GNF 8777.497369

GTQ 7.643867

GYD 209.252937

HKD 7.83558

HNL 26.629448

HRK 6.447202

HTG 130.892468

HUF 312.100503

IDR 17433

ILS 2.95367

IMP 0.736222

INR 95.350202

IQD 1310

IRR 1314999.999816

ISK 122.709857

JEP 0.736222

JMD 157.565709

JOD 0.709029

JPY 157.276498

KES 129.191543

KGS 87.420503

KHR 4011.999844

KMF 420.502192

KPW 899.999998

KRW 1475.990178

KWD 0.30811

KYD 0.833593

KZT 463.980036

LAK 21962.493505

LBP 89401.229103

LKR 319.60688

LRD 183.624998

LSL 16.83005

LTL 2.95274

LVL 0.60489

LYD 6.334982

MAD 9.246963

MDL 17.22053

MGA 4154.999745

MKD 52.771476

MMK 2099.74975

MNT 3576.675528

MOP 8.070745

MRU 39.949922

MUR 46.950046

MVR 15.454942

MWK 1741.501824

MXN 17.509742

MYR 3.964503

MZN 63.909913

NAD 16.830069

NGN 1370.929942

NIO 36.719711

NOK 9.27435

NPR 152.110449

NZD 1.702285

OMR 0.384497

PAB 1.000329

PEN 3.505986

PGK 4.332503

PHP 61.7085

PKR 278.749656

PLN 3.64193

PYG 6218.192229

QAR 3.642973

RON 4.441799

RSD 100.477983

RUB 75.00169

RWF 1460.5

SAR 3.752195

SBD 8.025868

SCR 13.35873

SDG 600.507781

SEK 9.299335

SGD 1.277245

SHP 0.746601

SLE 24.649962

SLL 20969.496166

SOS 571.499363

SRD 37.455993

STD 20697.981008

STN 21.15

SVC 8.752948

SYP 110.524984

SZL 16.82975

THB 32.770189

TJS 9.363182

TMT 3.505

TND 2.885502

TOP 2.40776

TRY 45.21975

TTD 6.794204

TWD 31.6445

TZS 2609.999854

UAH 44.075497

UGX 3753.577989

UYU 40.286638

UZS 11997.999952

VES 488.94275

VND 26323

VUV 118.778782

WST 2.715188

XAF 560.591908

XAG 0.013699

XAU 0.00022

XCD 2.70255

XCG 1.8029

XDR 0.69563

XOF 558.501381

XPF 102.375041

YER 238.625019

ZAR 16.80115

ZMK 9001.200271

ZMW 18.731492

ZWL 321.999592

RYCEF

-0.0200

16.33

-0.12%
RBGPF

1.6000

64.7

+2.47%
GSK

-0.7100

50.9

-1.39%
BCE

-0.0300

23.93

-0.13%
CMSC

-0.0100

22.87

-0.04%
BP

0.5300

46.94

+1.13%
NGG

-0.9800

87.5

-1.12%
BTI

-0.3600

58.35

-0.62%
RIO

-1.9500

98.63

-1.98%
VOD

-0.1000

16.05

-0.62%
CMSD

-0.0300

23.25

-0.13%
JRI

-0.0500

12.93

-0.39%
BCC

-3.8000

74.33

-5.11%
RELX

0.0100

36.36

+0.03%
AZN

-1.2800

183.46

-0.7%

AI systems are already deceiving us -- and that's a problem, experts warn

TECHNOLOGY 10.05.2024

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

L.Davila--TFWP

The Fort Worth Press - AI systems are already deceiving us -- and that's a problem, experts warn

AI systems are already deceiving us -- and that's a problem, experts warn

Featured

Mysterious world beyond Pluto may have an atmosphere: astronomers

Consensus Mining & Seigniorage Corporation (OTCQX:CMSG) Announces 1Q2026 Financial Results and Upcoming Shareholder Call

Unusual Machines to Announce First Quarter 2026 Financial Results and Provide Corporate Update

1606 Corp. Signs Definitive Agreement to Acquire Sim Agro to Develop, Acquire, and Operate Data Center and Power Assets