首页 \ 问答 \ 如何训练人造神经网络使用视觉输入来玩Diablo 2?(How to train an artificial neural network to play Diablo 2 using visual input?)

如何训练人造神经网络使用视觉输入来玩Diablo 2?(How to train an artificial neural network to play Diablo 2 using visual input?)

我正在试图让一个ANN玩一个视频游戏,而且我希望从这里的美妙的社区得到一些帮助。

我已经定居在暗黑破坏神2上。游戏玩法是实时的,从等距的观点来看,玩家控制着相机所在的单一头像。

为了使事情变得更具体,任务是让你的角色x体验点,而不会使其健康状况下降到0,通过杀死怪物获得经验值。 这是一个游戏的例子:

这里

现在,由于我希望网络完全基于从屏幕上的像素得到的信息,所以它必须学习一个非常丰富的表示形式才能有效地播放,因为这可能需要它(至少隐含地)知道将游戏世界分为对象和如何与他们互动。

所有这些信息都必须被教给网...不知何故。 我不能为了我的生活想到如何训练这件事情。 我唯一的想法是有一个单独的程序从屏幕上直观地提取游戏中的内容(例如健康,金子,经验),然后在强化学习过程中使用该统计数据。 我认为这将是答案的一部分 ,但我认为这不足够。 从原始视觉输入到目标导向行为的抽象层次太多,这种有限的反馈在我一生中训练网络。

所以,我的问题:你还有什么其他的方式来训练网络至少做这个任务的一部分? 最好不要制作成千上万个标签的例子

只是为了更多的方向:我正在寻找一些其他的强化学习来源和/或任何无监督的方法来提取有用的信息在这个设置。 或者是一个监督的算法,如果你可以想到一种方法,将标签数据从游戏世界中取出,而无需手动标记它。

UPDATE(04/27/12):

奇怪的是,我还在努力,似乎正在取得进展。 获得ANN控制器工作最大的秘诀是使用适合该任务的最先进的ANN架构。 因此,我一直在使用一个深层信念网,它由有条件限制的玻尔兹曼机器组成 ,我以无监督的方式(在我玩游戏的视频)上进行了训练,然后在微调时间差异反向传播 (即强化学习与标准前馈ANN)。

仍然寻找更有价值的输入,特别是在动态选择的实时问题和如何编码彩色图像为ANN处理:-)

UPDATE(15年10月21日):

刚刚想起来,我回答了这个问题,并且认为我应该提到这不再是一个疯狂的想法。 自从我上次更新以来,DeepMind发表了他们的自然文章,从神经网络到视觉输入玩atari游戏 。 事实上,唯一阻止我使用自己的架构来发挥暗黑破坏神2的有限子集的缺点是缺乏访问底层的游戏引擎。 渲染到屏幕,然后将其重定向到网络,在合理的时间内训练太慢了。 因此,我们可能不会很快看到这种机器人玩Diablo 2,但只因为它会播放开源或API访问渲染目标的东西。 (也许地震?)


I'm currently trying to get an ANN to play a video game and and I was hoping to get some help from the wonderful community here.

I've settled on Diablo 2. Game play is thus in real-time and from an isometric viewpoint, with the player controlling a single avatar whom the camera is centered on.

To make things concrete, the task is to get your character x experience points without having its health drop to 0, where experience point are gained through killing monsters. Here is an example of the gameplay:

here

Now, since I want the net to operate based solely on the information it gets from the pixels on the screen, it must learn a very rich representation in order to play efficiently, since this would presumably require it to know (implicitly at least) how divide the game world up into objects and how to interact with them.

And all of this information must be taught to the net... somehow. I can't for the life of me think of how to train this thing. My only idea is have a separate program visually extract something innately good/bad in the game (e.g. health, gold, experience) from the screen, and then use that stat in a reinforcement learning procedure. I think that will be part of the answer, but I don't think it'll be enough; there are just too many levels of abstraction from raw visual input to goal-oriented behavior for such limited feedback to train a net within my lifetime.

So, my question: what other ways can you think of to train a net to do at least some part of this task? preferably without making thousands of labeled examples...

Just for a little more direction: I'm looking for some other sources of reinforcement learning and/or any unsupervised methods for extracting useful information in this setting. Or a supervised algorithm if you can think of a way of getting labeled data out of a game world without having to manually label it.

UPDATE(04/27/12):

Strangely, I'm still working on this and seem to be making progress. The biggest secret to getting a ANN controller to work is to use the most advanced ANN architectures appropriate to the task. Hence I've been using a deep belief net composed of factored conditional restricted Boltzmann machines that I've trained in an unsupervised manner (on video of me playing the game) before fine tuning with temporal difference back-propagation (i.e. reinforcement learning with standard feedforward ANNs).

Still looking for more valuable input though, especially on the problem of action selection in real-time and how to encode color images for ANN processing :-)

UPDATE(10/21/15):

Just remembered I asked this question back-in-the-day, and thought I should mention that this is no longer a crazy idea. Since my last update, DeepMind published their nature paper on getting neural networks to play atari games from visual inputs. Indeed, the only thing preventing me from using their architecture to play, a limited subset, of Diablo 2 is the lack of access to the underlying game engine. Rendering to the screen and then redirecting it to the network is just far too slow to train in a reasonable amount of time. Thus we probably won't see this sort of bot playing Diablo 2 anytime soon, but only because it'll be playing something either open-source or with API access to the rendering target. (quake perhaps?)


原文:https://stackoverflow.com/questions/6542274
更新时间:2023-12-15 18:12

最满意答案

您可以使用assign ( doc )来更改perf.a1的值:

> assign(paste("perf.a", "1", sep=""),5)
> perf.a1
[1] 5

You can use assign (doc) to change the value of perf.a1:

> assign(paste("perf.a", "1", sep=""),5)
> perf.a1
[1] 5

相关问答

更多
  • 我建议: batch_id = ifelse(grepl("B$",data$var2) & nchar(data$var2)==6, paste(data$var1, data$var2, sep=""), NA) 一切都在一行,并避免添加额外的库和学习如何使用它们的复杂性...什么是不爱?! I'd recommend: batch_id = ifelse(grepl("B$",data$var2) & nchar(data$var2)==6 ...
  • 如果一个可爱的小sapply用switch投入很好的措施 ts <- data.frame(t2 = c(1,1,0,0,0,2)) s <- which(rowSums(ts) > 0) sapply(s, function(x) paste0("There ", switch(ts[x, ], "1" = "is ", "are "), ts[x,], " missing from group ", x)) # [1] "There is 1 missing from group 1" "The ...
  • 您可以使用assign ( doc )来更改perf.a1的值: > assign(paste("perf.a", "1", sep=""),5) > perf.a1 [1] 5 You can use assign (doc) to change the value of perf.a1: > assign(paste("perf.a", "1", sep=""),5) > perf.a1 [1] 5
  • 您在data_mysql和data_red的定义中使用paste是没有意义的。 我假设你真正想要的是这样的: data_mysql = MySQL_Data[j, 1] data_red = RED_Data[j, i] 此外,您在每次循环迭代中都重置了body ,所以它只能保存一个元素。 Your use of paste in the definitions of data_mysql and data_red makes no sense. I’m assuming that what you ac ...
  • 希德运气(106): > fortune(106) If the answer is parse() you should usually rethink the question. -- Thomas Lumley R-help (February 2005) 所以我会鼓励你重新思考你想做的事情...... 我想你可以用%in%或%in%来实现你想要的结果,但是你没有告诉我们你想要做什么。 > sample <- c("a","b","c") > var <- c("a","d"," ...
  • 你可以使用get 例如 var1 <- get(paste(e, ".2", sep="")) var2 <- get(paste(e, ".7", sep="")) 编辑:正如Aidan Cully正确说的那样你应该把你的函数称为q("e") (即带字符串) You can use get For instance var1 <- get(paste(e, ".2", sep="")) var2 <- get(paste(e, ".7", sep="")) EDIT: as Aidan Cully c ...
  • eval(parse(text =将完成你所需要的工作,但是为了完成数据操作/分配,使用文本通常不是一个好主意),通常有更好的方法来做到这一点。 话说回来: n <- 1 m <- 314 v <- 5 sec1 <- rep(NA, 800) eval(parse(text = paste0("sec", n , "[", m, "]" , '<-', v))) 输出: > summary(sec1) Min. 1st Qu. Median Mean 3rd Qu. Max. ...
  • 这个怎么样: plot(get(thing)) Running thing = "SAM"只是将字符“SAM”分配给名为thing的变量。 R无法知道(没有你告诉它)你希望它将字符向量thing的值连接到环境中的特定对象(即SAM )。 所以在这里get诀窍。 How about this: plot(get(thing)) Running thing = "SAM" simply assigns the character "SAM" to a variable named thing. R ...
  • 这是一个可以帮助你的大部分内容: sapply(list.files("~/r"), FUN = function(X) assign(X, rnorm(1))) 这会为全局环境中的对象分配一个随机数,每个对象的名称来自~/r/目录中的文件。 举一个具体的例子,假设我们有一个目录~/r ,我们希望读取文件并将它们作为环境中的单独项目 - 然后我们会做以下事情: list2env(sapply(list.files("~/r"), FUN = function(X) read.csv(X)), globa ...
  • 这是R中单个“\”的打印表示 。很明显,正确的答案将取决于您的最终用途,但是会这样做: > citations <- paste("title", 1:3, sep = "") > cites <- paste("\\citep{", citations, "}", sep = "") > writeLines(cites) \citep{title1} \citep{title2} \citep{title3} 使用writeLines()你可以使用类似的东西将它输出到一个文件中: > writeLin ...

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)