首页pandas drop_duplicates

pandas drop_duplicates

时间: 2023-10-09 14:08:24 浏览: 88

The `drop_duplicates()` method in pandas is used to remove duplicate rows from a DataFrame. By default, it considers all columns and removes rows that have the same values in all columns. However, you can also specify a subset of columns to consider for identifying duplicates. Syntax: ``` DataFrame.drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False) ``` Parameters: - `subset`: A list of column names to consider for identifying duplicates. - `keep`: Specifies which duplicates to keep. Possible values are `'first'` (keep the first occurrence), `'last'` (keep the last occurrence), and `False` (remove all duplicates). - `inplace`: If `True`, the original DataFrame is modified in place and nothing is returned. If `False` (default), a new DataFrame with duplicates removed is returned. - `ignore_index`: If `True`, the resulting DataFrame will have a new index from 0 to n-1, where n is the number of rows. If `False` (default), the resulting DataFrame will have the same index as the original DataFrame. Example: ``` import pandas as pd # Creating a DataFrame with duplicate rows data = {'Name': ['John', 'John', 'Mary', 'Tom', 'Mary'], 'Age': [25, 25, 30, 35, 30], 'City': ['New York', 'New York', 'Los Angeles', 'Chicago', 'Los Angeles']} df = pd.DataFrame(data) # Removing duplicate rows df.drop_duplicates(inplace=True) print(df) ``` Output: ``` Name Age City 0 John 25 New York 2 Mary 30 Los Angeles 3 Tom 35 Chicago 4 Mary 30 Los Angeles ``` In the above example, the `drop_duplicates()` method removes the duplicate rows based on all columns and returns a new DataFrame. If `inplace=True` is used, the original DataFrame is modified directly.

阅读全文