IZBOR HIPERPARAMETARA ALGORITAMA DUBOKOG UČENJA SA POTKREPLJENJEM PRIMENOM GENETSKOG ALGORITMA

Vasilije Pantić

doi:10.24867/15BE28Pantic

Vasilije Pantić

DOI: https://doi.org/10.24867/15BE28Pantic

Ključne reči: Duboko učenje sa potkrepljenjem, genetski algoritam

Apstrakt

Ovaj rad rešava problem hoda robota u prostoru pomoću algoritama dubokog učenja sa potkrepljenjem koji se optimizuju pomoću genetskog algoritma.

Reference

[1] Schulman, John, et al. "Trust region policy optimization." International conference on machine learning. PMLR, 2015.
[2] Fletcher, Roger. "Conjugate gradient methods for indefinite systems." Numerical analysis. Springer, Berlin, Heidelberg, 1976. 73-89.
[3] Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
[4] https://gym.openai.com/
[5] https://pybullet.org/wordpress/
[6] https://pytorch.org/
[7] https://github.com/reinai/HumanoidRobotWalk
[8]https://github.com/sovaso/GeneticAlgorithmForHumanoidRobotWalk
[9] Reynolds, Douglas A. "Gaussian mixture models." Encyclopedia of biometrics 741 (2009): 659-663.
[10] Gao, Bolin, and Lacra Pavel. "On the properties of the softmax function with application in game theory and reinforcement learning." arXiv preprint arXiv:1704.00805 (2017).