如何比较3个表并在mysql中合并它们的列(How to compare 3 tables and merge their columns in mysql)
这些是我的表;
mysql> show tables; +--------------------+ | Tables_in_products | +--------------------+ | main_info | | product1 | | product2 | +--------------------+ 3 rows in set (0.00 sec)
这是我的第一张桌子
mysql> select * from main_info; +------+------+-------+-------+----------+ | key1 | key2 | info1 | info2 | date | +------+------+-------+-------+----------+ | 1 | 1 | 15 | 90 | 20120501 | | 1 | 2 | 14 | 92 | 20120601 | | 1 | 3 | 15 | 82 | 20120801 | | 2 | 1 | 17 | 90 | 20130302 | | 2 | 2 | 16 | 88 | 20130601 | +------+------+-------+-------+----------+ 5 rows in set (0.00 sec)
这是产品表1:
mysql> select * from product1; +------+------+--------+--------------+ | key1 | key2 | serial | product_data | +------+------+--------+--------------+ | 1 | 1 | 0 | 15.556 | | 1 | 1 | 1 | 14.996 | | 1 | 1 | 2 | 12.556 | | 1 | 1 | 3 | 15.669 | | 1 | 2 | 0 | 12.556 | | 1 | 2 | 1 | 13.335 | | 1 | 3 | 1 | 12.225 | | 1 | 3 | 2 | 13.556 | | 1 | 3 | 3 | 14.556 | | 2 | 1 | 0 | 12.556 | | 2 | 1 | 1 | 13.553 | | 2 | 1 | 2 | 12.335 | +------+------+--------+--------------+ 12 rows in set (0.00 sec)
这是第二个产品表
mysql> select * from product2; +------+------+--------+--------------+ | key1 | key2 | serial | product_data | +------+------+--------+--------------+ | 1 | 1 | 0 | 5.556 | | 1 | 1 | 1 | 4.996 | | 1 | 2 | 0 | 2.556 | | 1 | 2 | 1 | 3.335 | | 1 | 2 | 2 | 2.56 | | 1 | 2 | 3 | 3.556 | | 1 | 3 | 1 | 2.225 | | 1 | 3 | 2 | 3.556 | | 2 | 2 | 0 | 2.556 | | 2 | 2 | 1 | 3.553 | +------+------+--------+--------------+ 10 rows in set (0.00 sec)
我有超过8个产品表,其中我想比较2产品表和main_info表的
key1, key2 and serial
,取决于我想要合并的最大值,如果产品数据不存在则替换NaN
,最后想有1输出如下预期的O / P.
key1 key2 serial info1 info2 date product_table1_data product_table2_data 1 1 0 15 90 20120501 15.556 5.556 1 1 1 15 90 20120501 14.996 4.996 1 1 2 15 90 20120501 12.556 NaN 1 1 3 15 90 20120501 15.669 NaN 1 2 0 14 92 20120601 12.556 2.556 1 2 1 14 92 20120601 13.335 3.335 1 2 2 14 92 20120601 NaN 2.56 1 2 3 14 92 20120601 NaN 3.556 1 3 1 15 82 20120801 12.225 2.225 1 3 2 15 82 20120801 13.556 3.556 1 3 3 15 82 20120801 14.556 NaN 2 1 0 17 90 20130302 12.556 NaN 2 1 1 17 90 20130302 13.553 NaN 2 1 2 17 90 20130302 12.335 NaN 2 2 0 16 88 20130601 NaN 2.556 2 2 1 16 88 20130601 NaN 3.553
请有人帮助我获得预期的结果
这是我的数据库的结构;
$ cat product.sql -- -- Table structure for table `main_info` -- DROP TABLE IF EXISTS `main_info`; CREATE TABLE `main_info` ( `key1` varchar(1000) DEFAULT NULL, `key2` varchar(1000) DEFAULT NULL, `info1` varchar(1000) DEFAULT NULL, `info2` varchar(1000) DEFAULT NULL, `date` varchar(1000) DEFAULT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1; LOCK TABLES `main_info` WRITE; INSERT INTO `main_info` VALUES ('1','1','15','90','20120501'),('1','2','14','92','20120601'),('1','3','15','82','20120801'),('2','1','17','90','20130302'),('2','2','16','88','20130601'); UNLOCK TABLES; DROP TABLE IF EXISTS `product1`; CREATE TABLE `product1` ( `key1` varchar(1000) DEFAULT NULL, `key2` varchar(1000) DEFAULT NULL, `serial` varchar(1000) DEFAULT NULL, `product_data` varchar(1000) DEFAULT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1; LOCK TABLES `product1` WRITE; INSERT INTO `product1` VALUES ('1','1','0','15.556'),('1','1','1','14.996'),('1','1','2','12.556'),('1','1','3','15.669'),('1','2','0','12.556'),('1','2','1','13.335'),('1','3','1','12.225'),('1','3','2','13.556'),('1','3','3','14.556'),('2','1','0','12.556'),('2','1','1','13.553'),('2','1','2','12.335'); UNLOCK TABLES; DROP TABLE IF EXISTS `product2`; CREATE TABLE `product2` ( `key1` varchar(1000) DEFAULT NULL, `key2` varchar(1000) DEFAULT NULL, `serial` varchar(1000) DEFAULT NULL, `product_data` varchar(1000) DEFAULT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1; LOCK TABLES `product2` WRITE; INSERT INTO `product2` VALUES ('1','1','0','5.556'),('1','1','1','4.996'),('1','2','0','2.556'),('1','2','1','3.335'),('1','2','2','2.56'),('1','2','3','3.556'),('1','3','1','2.225'),('1','3','2','3.556'),('2','2','0','2.556'),('2','2','1','3.553'); UNLOCK TABLES;
我尝试过这个,但是这并没有考虑每个密钥对和NaN的序列最大值
SELECT * FROM main_info INNER JOIN product1 ON main_info.key1=product1.key1 and main_info.key2=product1.key2 INNER JOIN product2 ON product1.key1=product2.key1 and product1.key2=product2.key2
合并流程细节:回答评论
1。 从main_info表2中取出key1和key2。 在表product1,product2中搜索,如果在表中的任何一个表或两个表中都找到key1和key2,则获取这些键的串行列的长度,找到最大长度,
例如在product1表中,我们有4个序列号
+------+------+--------+--------------+ | key1 | key2 | serial | product_data | +------+------+--------+--------------+ | 1 | 1 | 0 | 15.556 | | 1 | 1 | 1 | 14.996 | | 1 | 1 | 2 | 12.556 | | 1 | 1 | 3 | 15.669 |
在第二个表中,我们只有2个序列号,仅0和1
+------+------+--------+--------------+ | key1 | key2 | serial | product_data | +------+------+--------+--------------+ | 1 | 1 | 0 | 5.556 | | 1 | 1 | 1 | 4.996 |
所以从表中合并序列号并检查数据是否存在,如果存在,则保持product_date写入NaN
合并2个产品表时输出应该是这样的,一旦完成,添加对应于key1和key2的info1 info2和date列
+------+------+--------+--------------+-------------- | key1 | key2 | serial | product_data |product_data2 +------+------+--------+--------------+------------- | 1 | 1 | 0 | 15.556 |5.556 | 1 | 1 | 1 | 14.996 |4.996 | 1 | 1 | 2 | 12.556 |NaN | 1 | 1 | 3 | 15.669 |NaN
如果您需要更多信息,请与我们联系
谢谢。
These are my tables;
mysql> show tables; +--------------------+ | Tables_in_products | +--------------------+ | main_info | | product1 | | product2 | +--------------------+ 3 rows in set (0.00 sec)
This is my first table
mysql> select * from main_info; +------+------+-------+-------+----------+ | key1 | key2 | info1 | info2 | date | +------+------+-------+-------+----------+ | 1 | 1 | 15 | 90 | 20120501 | | 1 | 2 | 14 | 92 | 20120601 | | 1 | 3 | 15 | 82 | 20120801 | | 2 | 1 | 17 | 90 | 20130302 | | 2 | 2 | 16 | 88 | 20130601 | +------+------+-------+-------+----------+ 5 rows in set (0.00 sec)
This is product table1 :
mysql> select * from product1; +------+------+--------+--------------+ | key1 | key2 | serial | product_data | +------+------+--------+--------------+ | 1 | 1 | 0 | 15.556 | | 1 | 1 | 1 | 14.996 | | 1 | 1 | 2 | 12.556 | | 1 | 1 | 3 | 15.669 | | 1 | 2 | 0 | 12.556 | | 1 | 2 | 1 | 13.335 | | 1 | 3 | 1 | 12.225 | | 1 | 3 | 2 | 13.556 | | 1 | 3 | 3 | 14.556 | | 2 | 1 | 0 | 12.556 | | 2 | 1 | 1 | 13.553 | | 2 | 1 | 2 | 12.335 | +------+------+--------+--------------+ 12 rows in set (0.00 sec)
This is second product table
mysql> select * from product2; +------+------+--------+--------------+ | key1 | key2 | serial | product_data | +------+------+--------+--------------+ | 1 | 1 | 0 | 5.556 | | 1 | 1 | 1 | 4.996 | | 1 | 2 | 0 | 2.556 | | 1 | 2 | 1 | 3.335 | | 1 | 2 | 2 | 2.56 | | 1 | 2 | 3 | 3.556 | | 1 | 3 | 1 | 2.225 | | 1 | 3 | 2 | 3.556 | | 2 | 2 | 0 | 2.556 | | 2 | 2 | 1 | 3.553 | +------+------+--------+--------------+ 10 rows in set (0.00 sec)
I have more than 8 product table, in which I would like to compare
key1, key2 and serial
, of 2 product table and main_info table, depending on the maximum I would like to merge, if product data is not exist then substituteNaN
and finally would like to have 1 output like belowExpected O/P
key1 key2 serial info1 info2 date product_table1_data product_table2_data 1 1 0 15 90 20120501 15.556 5.556 1 1 1 15 90 20120501 14.996 4.996 1 1 2 15 90 20120501 12.556 NaN 1 1 3 15 90 20120501 15.669 NaN 1 2 0 14 92 20120601 12.556 2.556 1 2 1 14 92 20120601 13.335 3.335 1 2 2 14 92 20120601 NaN 2.56 1 2 3 14 92 20120601 NaN 3.556 1 3 1 15 82 20120801 12.225 2.225 1 3 2 15 82 20120801 13.556 3.556 1 3 3 15 82 20120801 14.556 NaN 2 1 0 17 90 20130302 12.556 NaN 2 1 1 17 90 20130302 13.553 NaN 2 1 2 17 90 20130302 12.335 NaN 2 2 0 16 88 20130601 NaN 2.556 2 2 1 16 88 20130601 NaN 3.553
Kindly someone help me to get expected result
This is structure of my database;
$ cat product.sql -- -- Table structure for table `main_info` -- DROP TABLE IF EXISTS `main_info`; CREATE TABLE `main_info` ( `key1` varchar(1000) DEFAULT NULL, `key2` varchar(1000) DEFAULT NULL, `info1` varchar(1000) DEFAULT NULL, `info2` varchar(1000) DEFAULT NULL, `date` varchar(1000) DEFAULT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1; LOCK TABLES `main_info` WRITE; INSERT INTO `main_info` VALUES ('1','1','15','90','20120501'),('1','2','14','92','20120601'),('1','3','15','82','20120801'),('2','1','17','90','20130302'),('2','2','16','88','20130601'); UNLOCK TABLES; DROP TABLE IF EXISTS `product1`; CREATE TABLE `product1` ( `key1` varchar(1000) DEFAULT NULL, `key2` varchar(1000) DEFAULT NULL, `serial` varchar(1000) DEFAULT NULL, `product_data` varchar(1000) DEFAULT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1; LOCK TABLES `product1` WRITE; INSERT INTO `product1` VALUES ('1','1','0','15.556'),('1','1','1','14.996'),('1','1','2','12.556'),('1','1','3','15.669'),('1','2','0','12.556'),('1','2','1','13.335'),('1','3','1','12.225'),('1','3','2','13.556'),('1','3','3','14.556'),('2','1','0','12.556'),('2','1','1','13.553'),('2','1','2','12.335'); UNLOCK TABLES; DROP TABLE IF EXISTS `product2`; CREATE TABLE `product2` ( `key1` varchar(1000) DEFAULT NULL, `key2` varchar(1000) DEFAULT NULL, `serial` varchar(1000) DEFAULT NULL, `product_data` varchar(1000) DEFAULT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1; LOCK TABLES `product2` WRITE; INSERT INTO `product2` VALUES ('1','1','0','5.556'),('1','1','1','4.996'),('1','2','0','2.556'),('1','2','1','3.335'),('1','2','2','2.56'),('1','2','3','3.556'),('1','3','1','2.225'),('1','3','2','3.556'),('2','2','0','2.556'),('2','2','1','3.553'); UNLOCK TABLES;
I tried this, but this is not taking care about serial maximum for each key pair and about NaN
SELECT * FROM main_info INNER JOIN product1 ON main_info.key1=product1.key1 and main_info.key2=product1.key2 INNER JOIN product2 ON product1.key1=product2.key1 and product1.key2=product2.key2
Merging process detail : Answer for comment
1 . take key1 and key2 from main_info table 2 . search in table product1, product2, if key1 and key2 is found in any one of the table or in both the table, get the length of serial column for those keys, find maximum length,
for example in product1 table, we have 4 serial number
+------+------+--------+--------------+ | key1 | key2 | serial | product_data | +------+------+--------+--------------+ | 1 | 1 | 0 | 15.556 | | 1 | 1 | 1 | 14.996 | | 1 | 1 | 2 | 12.556 | | 1 | 1 | 3 | 15.669 |
and in second table we have just 2 serial numbers, 0 and 1 only
+------+------+--------+--------------+ | key1 | key2 | serial | product_data | +------+------+--------+--------------+ | 1 | 1 | 0 | 5.556 | | 1 | 1 | 1 | 4.996 |
so merge serial numbers from both the table and check whether data exists or not if exists then keep product_date else write NaN
Output should be like this when you merge 2 product tables, once this is done add info1 info2 and date column corresponding to key1 and key2
+------+------+--------+--------------+-------------- | key1 | key2 | serial | product_data |product_data2 +------+------+--------+--------------+------------- | 1 | 1 | 0 | 15.556 |5.556 | 1 | 1 | 1 | 14.996 |4.996 | 1 | 1 | 2 | 12.556 |NaN | 1 | 1 | 3 | 15.669 |NaN
Please let me know if you need more information
Thank you.
原文:https://stackoverflow.com/questions/24155835
最满意答案
如果要删除所有打开和关闭的HTML标记,则可以尝试查找模式
</?[^>]+>
并用空字符串替换:x <- "tags : </P></TEXT> </BODY> <TRAILER> NYT-06-22-98 1759EDT &QL; </TRAILER> </DOC>" gsub("</?[^>]+>", "", x) [1] "tags : NYT-06-22-98 1759EDT &QL; "
演示
作为主要的免责声明,一般来说,您不应该使用正则表达式来解析HTML / XML内容。 在这种特殊情况下,如果您只想剥离所有标签,
gsub
可能是一个可行的选择。If you want to remove all opening and closing HTML tags, then you may try finding the pattern
</?[^>]+>
and replacing with empty string:x <- "tags : </P></TEXT> </BODY> <TRAILER> NYT-06-22-98 1759EDT &QL; </TRAILER> </DOC>" gsub("</?[^>]+>", "", x) [1] "tags : NYT-06-22-98 1759EDT &QL; "
Demo
As a major disclaimer, in general you should not use regex to parse HTML/XML content. In this particular case, if you just want to strip off all tags,
gsub
may be a viable option.
相关问答
更多-
下列中不属于面向对象的编程语言的是?[2022-05-30]
a -
您可以尝试将您的语料库文本转换为数据框,并从数据框本身访问所需的文本。 我已经使用内置的示例数据“粗略”(来自tm包)。 data("crude") dataframe<-data.frame(text=unlist(sapply(crude, `[`, "content")), stringsAsFactors=F) dataframe[1,] [1] "Diamond Shamrock Corp said that\neffective today it had cut its contract pr ...
-
r for循环将列表中的每个文件分配给单个语料库(r for loop to assing every file in a list to an individual corpus)[2022-09-13]
在assign结束时缺少括号 assign(i, Corpus(VectorSource(read.table(filepath,encoding="latin1")))) You are missing a parenthesis at the end of assign assign(i, Corpus(VectorSource(read.table(filepath,encoding="latin1")))) -
这正如你所想的那样工作。 pathname <- c("C:/data/test") study.files <- list.files(path = pathname, pattern = NULL, all.files = T, full.names = T, recursive = T, ignore.case = T, include.dirs = F) ### This gives me a character vector that is equivalent to: study.fil ...
-
在预处理和标注您的语料库之后,您正在计算每个文档中剩余的单词数量。 当然,没有文字的“文件”数量为零。 此外,只有字母和标点的文档也是空的,因为您删除了这些字符串。 在你的数据中,你有许多空行的“文件”。 总的来说,您的语料库中有28个“文档”,但其中一半以上是空行(即它们包含零个词)。 您可以计算rowTotals每个文档的rowTotals 。 如果您检查rowTotals中的哪些条目等于零,您将获得随后从doc.m删除的文档编号: rowTotals # 1 2 3 4 5 6 7 8 ...
-
如果要删除所有打开和关闭的HTML标记,则可以尝试查找模式?[^>]+>并用空字符串替换: x <- "tags :