nr |
titel |
auteur |
tijdschrift |
jaar |
jaarg. |
afl. |
pagina('s) |
type |
1 |
A deep reinforcement learning framework for continuous intraday market bidding
|
Boukas, Ioannis |
|
|
110 |
9 |
p. 2335-2387 |
artikel |
2 |
Air Learning: a deep reinforcement learning gym for autonomous aerial robot visual navigation
|
Krishnan, Srivatsan |
|
|
110 |
9 |
p. 2501-2540 |
artikel |
3 |
Automatic discovery of interpretable planning strategies
|
SkirzyĆski, Julian |
|
|
110 |
9 |
p. 2641-2683 |
artikel |
4 |
Bandit algorithms to personalize educational chatbots
|
Cai, William |
|
|
110 |
9 |
p. 2389-2418 |
artikel |
5 |
Challenges of real-world reinforcement learning: definitions, benchmarks and analysis
|
Dulac-Arnold, Gabriel |
|
|
110 |
9 |
p. 2419-2468 |
artikel |
6 |
Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems
|
Likmeta, Amarildo |
|
|
110 |
9 |
p. 2541-2576 |
artikel |
7 |
Grounded action transformation for sim-to-real reinforcement learning
|
Hanna, Josiah P. |
|
|
110 |
9 |
p. 2469-2499 |
artikel |
8 |
Guest editorial: special issue on reinforcement learning for real life
|
Li, Yuxi |
|
|
110 |
9 |
p. 2291-2293 |
artikel |
9 |
IntelligentPooling: practical Thompson sampling for mHealth
|
Tomkins, Sabina |
|
|
110 |
9 |
p. 2685-2727 |
artikel |
10 |
Inverse reinforcement learning in contextual MDPs
|
Belogolovsky, Stav |
|
|
110 |
9 |
p. 2295-2334 |
artikel |
11 |
Lessons on off-policy methods from a notification component of a chatbot
|
Rome, Scott |
|
|
110 |
9 |
p. 2577-2602 |
artikel |
12 |
Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
|
Shang, Wenjie |
|
|
110 |
9 |
p. 2603-2640 |
artikel |