优化代码 try: self.df['sendtime'] = pd.to_datetime(self.df['sendtime'], errors='coerce', format='%Y-%m-%d %H:%M:%S') except: raise Exception('Vehicle data: date error!')
时间: 2024-04-16 12:25:48 浏览: 105
1. 异常处理:目前使用了一个通用的`except`语句来捕获所有异常并抛出一个自定义的错误消息。这可能会隐藏特定的错误类型,不利于问题的排查和修复。建议根据具体的异常类型进行处理,例如处理`ValueError`、`TypeError`等。
2. 错误处理:当前代码在遇到日期转换错误时会抛出自定义的`Exception`。您可以考虑根据具体情况选择使用更具体的异常类型,或者在错误消息中提供更多的上下文信息,以便于问题排查。
3. 错误处理方式:根据您的需求,您可以选择不同的错误处理方式。除了抛出异常,您还可以选择将无效日期设置为特定的默认值或者删除包含无效日期的行。
4. 代码简化:如果您确定数据中只会包含有效的日期格式,可以考虑省略错误处理部分,或者通过添加更多的验证步骤来确保数据的有效性。
优化代码 def cluster_format(self, start_time, end_time, save_on=True, data_clean=False, data_name=None): """ local format function is to format data from beihang. :param start_time: :param end_time: :return: """ # 户用簇级数据清洗 if data_clean: unused_index_col = [i for i in self.df.columns if 'Unnamed' in i] self.df.drop(columns=unused_index_col, inplace=True) self.df.drop_duplicates(inplace=True, ignore_index=True) self.df.reset_index(drop=True, inplace=True) dupli_header_lines = np.where(self.df['sendtime'] == 'sendtime')[0] self.df.drop(index=dupli_header_lines, inplace=True) self.df = self.df.apply(pd.to_numeric, errors='ignore') self.df['sendtime'] = pd.to_datetime(self.df['sendtime']) self.df.sort_values(by='sendtime', inplace=True, ignore_index=True) self.df.to_csv(data_name, index=False) # 调用基本格式化处理 self.df = super().format(start_time, end_time) module_number_register = np.unique(self.df['bat_module_num']) # if registered m_num is 0 and not changed, there is no module data if not np.any(module_number_register): logger.logger.warning("No module data!") sys.exit() if 'bat_module_voltage_00' in self.df.columns: volt_ref = 'bat_module_voltage_00' elif 'bat_module_voltage_01' in self.df.columns: volt_ref = 'bat_module_voltage_01' elif 'bat_module_voltage_02' in self.df.columns: volt_ref = 'bat_module_voltage_02' else: logger.logger.warning("No module data!") sys.exit() self.df.dropna(axis=0, subset=[volt_ref], inplace=True) self.df.reset_index(drop=True, inplace=True) self.headers = list(self.df.columns) # time duration of a cluster self.length = len(self.df) if self.length == 0: logger.logger.warning("After cluster data clean, no effective data!") raise ValueError("No effective data after cluster data clean.") self.cluster_stats(save_on) for m in range(self.mod_num): print(self.clusterid, self.mod_num) self.module_list.append(np.unique(self.df[f'bat_module_sn_{str(m).zfill(2)}'].dropna())[0])
Here are some possible optimizations for the given code:
1. Instead of using a list comprehension to find columns with 'Unnamed' in their names, you can use the `filter()` function along with a lambda function to achieve the same result in a more concise way:
unused_index_col = list(filter(lambda x: 'Unnamed' in x, self.df.columns))
2. Instead of dropping duplicates and resetting the index separately, you can use the `drop_duplicates()` function with the `ignore_index` parameter set to `True` to achieve both in one step:
self.df.drop_duplicates(inplace=True, ignore_index=True)
3. Instead of using `sys.exit()` to terminate the program when there is no module data, you can raise a `ValueError` with an appropriate error message:
raise ValueError("No module data!")
4. Instead of using a series of `if` statements to find the voltage reference column, you can use the `loc` accessor with a boolean mask to select the first column that starts with 'bat_module_voltage':
volt_ref_col = self.df.columns[self.df.columns.str.startswith('bat_module_voltage')][0]
5. Instead of using a loop to append a single item to a list, you can use the `append()` method directly:
By applying these optimizations, the code can become more concise and efficient.
优化代码,GPU加速 def temp_condtion(df, temp_upper, temp_low): return ((df['max_temp']<=temp_upper) & (df['min_temp']>=temp_low)) def soc_condtion(df, soc_upper, soc_low): return ((df['bat_module_soc_00']<=temp_upper) & (df['bat_module_soc_00']>=temp_low)) def current_condtion(df, min_curr, batt_state): if batt_state=='charge': return (df['bat_module_current_00'].abs()>=min_curr) & (df['bat_module_current_00']>=0) elif batt_state=="discharge": return (df['bat_module_current_00'].abs()>=min_curr) & (df['bat_module_current_00']<=0 # 板端运行逻辑 data = {'realtime':[], 'cell_volt':[], 'total_current':[]} index = [] # (total_current[j]<=0) for i in tqdm(df.index[temp_condtion(df, temp_upper, temp_low) & soc_condtion(df, soc_upper, soc_low) & current_condtion(df, min_curr, 'discharge')]: n = 0 k = i while (n <= data_point) & (i <= len(df)-100): idx_list = [] idx_list.append(i) for j in np.arange(i+1, len(df)): if ((sendtime.iloc[j]-sendtime.iloc[k]).total_seconds()>=time_interval): break elif (df['max_temp'].iloc[j]<=temp_upper) & (df['min_temp'].iloc[j]>=temp_low) & \ (df['bat_module_soc_00'].iloc[j]>=soc_low) & (df['bat_module_soc_00'].iloc[j]<=soc_upper) & \ ((sendtime[j]-sendtime[i]).total_seconds()>=sample_interval) & \ ((sendtime.iloc[j]-sendtime.iloc[k]).total_seconds()<=time_interval) & \ (np.abs(total_current[j]-total_current[i])>=curr_interval) & (np.abs(soc[j]-soc[i])<=soc_interval) & \ (np.abs(total_current[j])>=min_curr): n+=1 idx_list.append(j) i = j if ((sendtime.iloc[j]-sendtime.iloc[k]).total_seconds()>=time_interval): break if len(idx_list) >= data_point: print(idx_list) index.append(idx_list)
There are a few ways to optimize this code and potentially utilize GPU acceleration:
1. Use Numba: Numba is a just-in-time compiler for Python that can compile Python code to run on GPUs. This can significantly speed up code execution. You can decorate your functions with `@jit` to have them compiled by Numba.
2. Use Pandas' `query` method: Rather than using boolean indexing, you can use the `query` method of a Pandas DataFrame to filter rows based on conditions. For example, you can replace `temp_condition(df, temp_upper, temp_low)` with `df.query("max_temp <= @temp_upper and min_temp >= @temp_low")`.
3. Use vectorized operations: Instead of looping through each row of the DataFrame, you can use vectorized operations to apply your conditions across the entire DataFrame. For example, you can replace `df['bat_module_current_00'].abs()>=min_curr` with `np.abs(df['bat_module_current_00'])>=min_curr`.
4. Use Dask: Dask is a parallel computing library that can distribute computations across multiple CPUs or GPUs. You can use Dask to parallelize your code and potentially speed up execution. However, this may require significant changes to your code structure.