The Fort Worth Press - AI systems are already deceiving us -- and that's a problem, experts warn

USD -
AED 3.673104
AFN 63.000368
ALL 83.025041
AMD 377.503986
ANG 1.790083
AOA 917.000367
ARS 1391.524104
AUD 1.42193
AWG 1.8025
AZN 1.70397
BAM 1.689727
BBD 2.01353
BDT 122.670076
BGN 1.709309
BHD 0.377548
BIF 2970
BMD 1
BND 1.278587
BOB 6.90829
BRL 5.303943
BSD 0.999767
BTN 93.464137
BWP 13.632554
BYN 3.033193
BYR 19600
BZD 2.010678
CAD 1.370945
CDF 2275.000362
CHF 0.788304
CLF 0.023504
CLP 928.050396
CNY 6.886404
CNH 6.905615
COP 3715.51
CRC 466.966746
CUC 1
CUP 26.5
CVE 95.850394
CZK 21.21404
DJF 177.720393
DKK 6.46329
DOP 59.000359
DZD 132.032419
EGP 52.23604
ERN 15
ETB 157.150392
EUR 0.86509
FJD 2.21445
FKP 0.749058
GBP 0.749504
GEL 2.71504
GGP 0.749058
GHS 10.90504
GIP 0.749058
GMD 73.503851
GNF 8777.503848
GTQ 7.658082
GYD 209.166703
HKD 7.834085
HNL 26.560388
HRK 6.515304
HTG 131.155614
HUF 340.21804
IDR 16969
ILS 3.109125
IMP 0.749058
INR 93.76335
IQD 1310
IRR 1315625.000352
ISK 124.403814
JEP 0.749058
JMD 157.066706
JOD 0.70904
JPY 159.213504
KES 129.603801
KGS 87.447904
KHR 4010.00035
KMF 427.00035
KPW 899.950845
KRW 1502.120383
KWD 0.30659
KYD 0.833125
KZT 480.643127
LAK 21485.000349
LBP 89550.000349
LKR 311.869854
LRD 183.375039
LSL 17.010381
LTL 2.95274
LVL 0.60489
LYD 6.380381
MAD 9.360504
MDL 17.410687
MGA 4170.000347
MKD 53.380613
MMK 2099.773051
MNT 3569.674815
MOP 8.069756
MRU 40.130379
MUR 46.503741
MVR 15.460378
MWK 1737.000345
MXN 17.91731
MYR 3.939039
MZN 63.903729
NAD 16.830377
NGN 1356.230377
NIO 36.720377
NOK 9.565955
NPR 149.542319
NZD 1.71305
OMR 0.384488
PAB 0.999784
PEN 3.479039
PGK 4.31175
PHP 59.981504
PKR 279.203701
PLN 3.700335
PYG 6529.758871
QAR 3.644504
RON 4.406504
RSD 101.626038
RUB 83.131517
RWF 1459
SAR 3.754803
SBD 8.05166
SCR 14.985813
SDG 601.000339
SEK 9.35191
SGD 1.28129
SHP 0.750259
SLE 24.575038
SLL 20969.510825
SOS 571.503662
SRD 37.487504
STD 20697.981008
STN 21.515
SVC 8.747565
SYP 110.76532
SZL 16.830369
THB 32.840369
TJS 9.602575
TMT 3.51
TND 2.909038
TOP 2.40776
TRY 44.309704
TTD 6.782897
TWD 31.969038
TZS 2586.664038
UAH 43.796556
UGX 3778.931635
UYU 40.286315
UZS 12195.000334
VES 454.69063
VND 26312
VUV 119.036336
WST 2.744165
XAF 566.725992
XAG 0.014413
XAU 0.000219
XCD 2.70255
XCG 1.801775
XDR 0.705856
XOF 570.503593
XPF 103.550363
YER 238.603589
ZAR 17.06135
ZMK 9001.203584
ZMW 19.520498
ZWL 321.999592
  • RBGPF

    -13.5000

    69

    -19.57%

  • CMSC

    -0.2300

    22.62

    -1.02%

  • NGG

    -3.9050

    81.625

    -4.78%

  • CMSD

    -0.2850

    22.615

    -1.26%

  • GSK

    -0.7200

    51.65

    -1.39%

  • RIO

    -2.8300

    82.82

    -3.42%

  • BTI

    -1.4150

    57.305

    -2.47%

  • BCE

    0.0700

    25.8

    +0.27%

  • BP

    -1.0350

    44.825

    -2.31%

  • RYCEF

    -0.6100

    15.99

    -3.81%

  • RELX

    -0.5300

    33.29

    -1.59%

  • JRI

    -0.3250

    11.835

    -2.75%

  • BCC

    -1.8400

    68.02

    -2.71%

  • VOD

    -0.1700

    14.25

    -1.19%

  • AZN

    -5.9700

    182.96

    -3.26%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: © AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

L.Davila--TFWP