The Fort Worth Press - AI systems are already deceiving us -- and that's a problem, experts warn

USD -
AED 3.6725
AFN 63.503502
ALL 81.990336
AMD 370.903715
ANG 1.789884
AOA 918.000197
ARS 1401.993023
AUD 1.39913
AWG 1.8025
AZN 1.696504
BAM 1.67146
BBD 2.014355
BDT 122.739548
BGN 1.668102
BHD 0.377403
BIF 2975
BMD 1
BND 1.275858
BOB 6.936925
BRL 4.965705
BSD 1.000128
BTN 95.070143
BWP 13.576443
BYN 2.828953
BYR 19600
BZD 2.011854
CAD 1.36214
CDF 2315.999417
CHF 0.784106
CLF 0.023178
CLP 912.21986
CNY 6.83025
CNH 6.83533
COP 3728.45
CRC 454.739685
CUC 1
CUP 26.5
CVE 94.650328
CZK 20.87905
DJF 177.720468
DKK 6.39432
DOP 59.600085
DZD 132.411933
EGP 53.530803
ERN 15
ETB 157.075029
EUR 0.8557
FJD 2.202202
FKP 0.736222
GBP 0.739275
GEL 2.685011
GGP 0.736222
GHS 11.195005
GIP 0.736222
GMD 73.49532
GNF 8777.497369
GTQ 7.643867
GYD 209.252937
HKD 7.83558
HNL 26.629448
HRK 6.447202
HTG 130.892468
HUF 312.100503
IDR 17433
ILS 2.95367
IMP 0.736222
INR 95.350202
IQD 1310
IRR 1314999.999816
ISK 122.709857
JEP 0.736222
JMD 157.565709
JOD 0.709029
JPY 157.276498
KES 129.191543
KGS 87.420503
KHR 4011.999844
KMF 420.502192
KPW 899.999998
KRW 1475.990178
KWD 0.30811
KYD 0.833593
KZT 463.980036
LAK 21962.493505
LBP 89401.229103
LKR 319.60688
LRD 183.624998
LSL 16.83005
LTL 2.95274
LVL 0.60489
LYD 6.334982
MAD 9.246963
MDL 17.22053
MGA 4154.999745
MKD 52.771476
MMK 2099.74975
MNT 3576.675528
MOP 8.070745
MRU 39.949922
MUR 46.950046
MVR 15.454942
MWK 1741.501824
MXN 17.509742
MYR 3.964503
MZN 63.909913
NAD 16.830069
NGN 1370.929942
NIO 36.719711
NOK 9.27435
NPR 152.110449
NZD 1.702285
OMR 0.384497
PAB 1.000329
PEN 3.505986
PGK 4.332503
PHP 61.7085
PKR 278.749656
PLN 3.64193
PYG 6218.192229
QAR 3.642973
RON 4.441799
RSD 100.477983
RUB 75.00169
RWF 1460.5
SAR 3.752195
SBD 8.025868
SCR 13.35873
SDG 600.507781
SEK 9.299335
SGD 1.277245
SHP 0.746601
SLE 24.649962
SLL 20969.496166
SOS 571.499363
SRD 37.455993
STD 20697.981008
STN 21.15
SVC 8.752948
SYP 110.524984
SZL 16.82975
THB 32.770189
TJS 9.363182
TMT 3.505
TND 2.885502
TOP 2.40776
TRY 45.21975
TTD 6.794204
TWD 31.6445
TZS 2609.999854
UAH 44.075497
UGX 3753.577989
UYU 40.286638
UZS 11997.999952
VES 488.94275
VND 26323
VUV 118.778782
WST 2.715188
XAF 560.591908
XAG 0.013699
XAU 0.00022
XCD 2.70255
XCG 1.8029
XDR 0.69563
XOF 558.501381
XPF 102.375041
YER 238.625019
ZAR 16.80115
ZMK 9001.200271
ZMW 18.731492
ZWL 321.999592
  • RYCEF

    -0.0200

    16.33

    -0.12%

  • RBGPF

    1.6000

    64.7

    +2.47%

  • GSK

    -0.7100

    50.9

    -1.39%

  • BCE

    -0.0300

    23.93

    -0.13%

  • CMSC

    -0.0100

    22.87

    -0.04%

  • BP

    0.5300

    46.94

    +1.13%

  • NGG

    -0.9800

    87.5

    -1.12%

  • BTI

    -0.3600

    58.35

    -0.62%

  • RIO

    -1.9500

    98.63

    -1.98%

  • VOD

    -0.1000

    16.05

    -0.62%

  • CMSD

    -0.0300

    23.25

    -0.13%

  • JRI

    -0.0500

    12.93

    -0.39%

  • BCC

    -3.8000

    74.33

    -5.11%

  • RELX

    0.0100

    36.36

    +0.03%

  • AZN

    -1.2800

    183.46

    -0.7%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: © AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

L.Davila--TFWP