首页 \ 问答 \ 将object类型的pandas数据框列转换为numpy数组(Convert a pandas dataframe column of type object to a numpy array)

将object类型的pandas数据框列转换为numpy数组(Convert a pandas dataframe column of type object to a numpy array)

我有一个拥有图像ID,图像类和图像数据的熊猫数据框:

img_train.head(5)

   ID  index  class                                               data
0  10472  10472      0  [[[255, 255, 255, 0], [255, 255, 255, 0], [255...
1   7655   7655      0  [[[255, 255, 255, 0], [255, 255, 255, 0], [255...
2   6197   6197      0  [[[255, 255, 255, 0], [255, 255, 255, 0], [255...
3   9741   9741      0  [[[255, 255, 255, 0], [255, 255, 255, 0], [255...
4   9169   9169      0  [[[255, 255, 255, 0], [255, 255, 255, 0], [255...

我试图将每个这些列转换为一个numpy数组:

train_img_array = np.array([])
train_id_array = np.array([])
train_lab_array = np.array([])
count = 0
for index, row in img_train.iterrows():
    imgid = row['ID']
    imgclass = row['class']
    imgdata = row['data']
    #print(imgdata)
    train_img_array = np.append(train_img_array, imgdata )
    train_lab_array = np.append(train_lab_array, imgclass )
    train_id_array = np.append(train_id_array, imgid )

但是,保存图像数据并且类型为'object'的列不会被转换为numpy数组中的对应行。 例如,这是在处理来自原始数据帧的58行之后每个numpy数组的形状:

train_img_array.shape
train_lab_array.shape
train_id_array.shape
(93615200,)
(58,)
(58,)

我该如何解决?


I have a pandas dataframe that holds the image id, image class and image data:

img_train.head(5)

   ID  index  class                                               data
0  10472  10472      0  [[[255, 255, 255, 0], [255, 255, 255, 0], [255...
1   7655   7655      0  [[[255, 255, 255, 0], [255, 255, 255, 0], [255...
2   6197   6197      0  [[[255, 255, 255, 0], [255, 255, 255, 0], [255...
3   9741   9741      0  [[[255, 255, 255, 0], [255, 255, 255, 0], [255...
4   9169   9169      0  [[[255, 255, 255, 0], [255, 255, 255, 0], [255...

I am trying to convert each of these columns to a numpy array:

train_img_array = np.array([])
train_id_array = np.array([])
train_lab_array = np.array([])
count = 0
for index, row in img_train.iterrows():
    imgid = row['ID']
    imgclass = row['class']
    imgdata = row['data']
    #print(imgdata)
    train_img_array = np.append(train_img_array, imgdata )
    train_lab_array = np.append(train_lab_array, imgclass )
    train_id_array = np.append(train_id_array, imgid )

However, the the column that holds the image data and is of the type 'object' is not getting translated into corresponding row in the numpy array. For instance, this is the shape of each numpy array after processing 58 rows from the original dataframe:

train_img_array.shape
train_lab_array.shape
train_id_array.shape
(93615200,)
(58,)
(58,)

How do i fix this?


原文:https://stackoverflow.com/questions/50587297
更新时间:2022-05-07 06:05

最满意答案

你想通过进度条来反映什么? 而不是下载文件(因为它很大)或处理文件?

反映处理文件的进度条

你的进度条不会改变,因为你的方法是同步的 - 没有别的事情会发生。 BackgroundWorker类完全适用于这类问题。 它以异步方式完成主要工作,并能够报告进度已更改。 以下是如何更改巡视方法以使用它:

private void button1_Click(object sender, EventArgs e)
{
    string text =  textBox1.Text;
    string url = "http://api.bing.net/xml.aspx?AppId=XXX&Query=" + text + "&Sources=Translation&Version=2.2&Market=en-us&Translation.SourceLanguage=en&Translation.TargetLanguage=De";

    XmlDocument xml = new XmlDocument();
    xml.Load(url);
    XmlNodeList node = xml.GetElementsByTagName("tra:TranslatedTerm");

    BackgroundWorker worker = new BackgroundWorker();

    // tell the background worker it can report progress
    worker.WorkerReportsProgress = true;

    // add our event handlers
    worker.RunWorkerCompleted += new RunWorkerCompletedEventHandler(this.RunWorkerCompleted);
    worker.ProgressChanged += new ProgressChangedEventHandler(this.ProgressChanged);
    worker.DoWork += new DoWorkEventHandler(this.DoWork);

    // start the worker thread
    worker.RunWorkerAsync(node);
}

现在,主要部分:

private void DoWork(object sender, DoWorkEventArgs e)
{
   // get a reference to the worker that started this request
   BackgroundWorker workerSender = sender as BackgroundWorker;

   // get a node list from agrument passed to RunWorkerAsync
   XmlNodeList node = e.Argument as XmlNodeList;

   for (int i = 0; x < node.Count; i++)
   {
       textBox2.Text = node[i].InnerText;
       workerSender.ReportProgress(node.Count / i);
   }
}

private void RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
    // do something after work is completed     
}

public void ProgressChanged( object sender, ProgressChangedEventArgs e )
{
    progressBar.Value = e.ProgressPercentage;
}

反映下载文件的进度条

尝试使用HttpWebRequest获取文件作为流。

// Create a 'WebRequest' object with the specified url. 
WebRequest myWebRequest = WebRequest.Create(url); 

// Send the 'WebRequest' and wait for response.
WebResponse myWebResponse = myWebRequest.GetResponse(); 

// Obtain a 'Stream' object associated with the response object.
Stream myStream = myWebResponse.GetResponseStream();

long myStreamLenght = myWebResponse.ContentLength;

所以现在你知道这个XML文件的长度。 然后你必须异步读取流中的内容( BackgroundWorkerStreamReader是一个好主意)。 使用myStream.PositionmyStreamLenght来计算进度。

我知道我不是很具体,但我只是想让你走向正确的方向。 我认为在这里写下所有这些东西是没有意义的。 在这里你有链接,可以帮助你处理StreamBackgroundWorker


What exacly do you want to reflect by progress bar? Rather downloading the file (because it's big) or processing the file?

Progress bar reflecting processing file

Your progress bar doesn't change because your method is synchronous - nothing else will happen unit it ends. BackgroundWorker class is designed perfectly for this kind of problems. It does main work in an asynchronous manner and is able to report that progress has changed. Here is how to change tour method to use it:

private void button1_Click(object sender, EventArgs e)
{
    string text =  textBox1.Text;
    string url = "http://api.bing.net/xml.aspx?AppId=XXX&Query=" + text + "&Sources=Translation&Version=2.2&Market=en-us&Translation.SourceLanguage=en&Translation.TargetLanguage=De";

    XmlDocument xml = new XmlDocument();
    xml.Load(url);
    XmlNodeList node = xml.GetElementsByTagName("tra:TranslatedTerm");

    BackgroundWorker worker = new BackgroundWorker();

    // tell the background worker it can report progress
    worker.WorkerReportsProgress = true;

    // add our event handlers
    worker.RunWorkerCompleted += new RunWorkerCompletedEventHandler(this.RunWorkerCompleted);
    worker.ProgressChanged += new ProgressChangedEventHandler(this.ProgressChanged);
    worker.DoWork += new DoWorkEventHandler(this.DoWork);

    // start the worker thread
    worker.RunWorkerAsync(node);
}

Now, the main part:

private void DoWork(object sender, DoWorkEventArgs e)
{
   // get a reference to the worker that started this request
   BackgroundWorker workerSender = sender as BackgroundWorker;

   // get a node list from agrument passed to RunWorkerAsync
   XmlNodeList node = e.Argument as XmlNodeList;

   for (int i = 0; x < node.Count; i++)
   {
       textBox2.Text = node[i].InnerText;
       workerSender.ReportProgress(node.Count / i);
   }
}

private void RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
    // do something after work is completed     
}

public void ProgressChanged( object sender, ProgressChangedEventArgs e )
{
    progressBar.Value = e.ProgressPercentage;
}

Progress bar reflecting downloading file

Try using HttpWebRequest to get the file as a stream.

// Create a 'WebRequest' object with the specified url. 
WebRequest myWebRequest = WebRequest.Create(url); 

// Send the 'WebRequest' and wait for response.
WebResponse myWebResponse = myWebRequest.GetResponse(); 

// Obtain a 'Stream' object associated with the response object.
Stream myStream = myWebResponse.GetResponseStream();

long myStreamLenght = myWebResponse.ContentLength;

So now you know the length of this XML file. Then you have to asynchronously read the content from stream (BackgroundWorker and StreamReader is a good idea). Use myStream.Position and myStreamLenght to calculate the progress.

I know that I'm not very specific but I just wanted to put you in the right direction. I think it doesn't make sense to write about all those things here. Here you have links that will help you dealing with Stream and BackgroundWorker:

相关问答

更多

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)