Update readme.md

cf74fcc1 · zhou · a283447e · cf74fcc1
Commit cf74fcc1 authored Mar 18, 2020 by zhou
Hide whitespace changes
Inline Side-by-side

Showing with 93 additions and 1 deletion

readme.md readme.md +93 -1

No files found.
--- a/readme.md
+++ b/readme.md
@@ -43,8 +43,100 @@
 #### 3.运行fileMonitor.sh。
 #### 4.将命名格式为'LXWL_2019-12-12_0101.txt.ann'的txt上传到data文件夹时，将自动解析标注结果，把标注结果作为参数项上传到Product_Parameter_Process库。
 ## 五.爬取数据在张楷SKU排重后，多线程上传页面模块。
-#### 1.直接运行crawl_data_run()即可将爬取的数据在子类，品牌对应，提取参数，爬取参数，参数对应后上传至页面。。
+#### 1.该模块包含在main_merge.py文件中，直接运行crawl_data_run()即可将爬取的数据在子类，品牌对应，提取参数，爬取参数，参数对应后上传至页面。。
 #### 2.数据的批次后缀为2。
 ```python
    crawl_data_run()
 ```
+#### 其中，因为多线程调用NER模型同时载入多个类时，会出现一些错误，所以暂时禁用了多线程,在实例化后依次运行各渠道爬取数据。目前的代码如下:
+```python
+	def crawl_data_run():
+    	os.chdir(r'/root/program/newProductCheck/online_progrom/code/API_data')
+    	Get_new()
+    	check_and_match()#张楷部分。
+    	thread_JD = myThread_crawl('JD')
+    	thread_GM = myThread_crawl('GM')
+    	thread_SN = myThread_crawl('SN')
+    	thread_OTHERS = myThread_crawl('OTHERS')
+```
+#### 在mythread_crawl()类中，输入渠道名即可运行，其流程与api数据接收模块一致，并且直接在其后接上param_extract_function_crawl(),其和上边的param_extract_function()方法类似，但是产品子类编码和品牌编码直接继承了api接口来源数据的原始值。代码如下:
+```python
+	class myThread_crawl():
+    	def __init__(self, channel):
+        	self.channel = channel
+        	self.data_get = crawl_data_fetch(channel = self.channel)
+        	print ("开始：" + self.channel)
+        	crawl_table = self.data_get.run()
+        	if isinstance(crawl_table,bool):
+            	pass
+        	else:
+            	if self.channel in ['JD','SN','GM']:
+                	param_extract_function_crawl(crawl_table, self.channel)
+            	else:
+                	param_extract_function_crawl(crawl_table, 'LXWL')
+        	print("退出：" + self.channel)
+        	return None
+```
+## 六.公共调用模块function。
+### 请将function.py置于同目录，并且输入:
+```python
+	from function import *
+```
+### 各函数功能及其调用方法。
+#### 1.sql_find()和 mysql_find():
+##### 该类可以直接调用来连向指定数据库，对于对象可以直接使用和pymssql与pymysql一样的cursor方法来进行数据库操作。
+##### 该类使用方法:
+```python
+	sql_LXWL = sql_find(source = 'ZH_LXWL', localhost = False)
+	sql_LXWL.cursor.execute(f"sql语句")
+```
+##### 其中source为连接的数据库名称，localhost为连接线上库还是本地库。
+#### 2.BN():
+##### 该方法为对产品品牌做初步的标准化，如果有中文品牌优先提取中文品牌，没有则提取英文品牌，并且去除特殊字符。
+##### 使用方法:
+```python
+	brand = BN('小米/MI')
+```
+#### 3.Index():
+##### 该类主要用于for循环中数据处理进度的可视化进度条。
+##### 该类的使用方法:
+```python
+	index = Index()
+	for i in len(list):
+		print(index(i, len(list)-1), end=f'% 进度为{i}/len{list}')
+```
+#### 4.brand_table_create():
+##### 该方法主要用于从线上库生成实时的中英文品牌表。搭配下面的tool()类使用。
+##### 使用方法:
+```python
+	brand_table = brand_table_create()
+```
+#### 5.tool():
+##### 该类主要有判定配件是否存在的judge_peijian()方法和获取品牌对应品牌编码的judge_brand方法。
+##### 使用方法：
+```python
+	tool = tool()
+	brand_id = tool.judge_brand(brand, brandcode_original) #其中brandcode_original为已知匹配到的品牌编码，如果前一轮没匹配上，则为‘没有对应指数品牌’即可。
+	peijian_table = tool.judge_peijian(dataframe) #输入一个dataframe，根据产品的子类编码，查表确认是否需要匹配配件，并且把结果加在后边两列上，分别为该子类有无配件，和是否只需要进行型号匹配。
+```
+#### 6.judge_unit():
+##### 判断是否为单位的方法，返回布尔值。
+##### 使用方法：
+```python
+	if judge_unit(string):
+		print('该字符串不为单位')
+	else:
+		print('该字符串为单位')
+```
+#### 7.type_extract_JD():
+##### 该函数为提取型号的函数，需要输入产品的名称，产品的参数字典，产品的品牌。
+##### 使用方法：
+```python
+	type = type_extract_JD(name, params, brand)
+```
+#### 8.param_load():
+##### 对xml信息进行参数解析，返回参数字典。
+##### 使用方法：
+```python
+	param_dict = param_load(SKU, xml_string)
+```
\ No newline at end of file