Pandas系列教程(1)

import pandas as pd

定义如下字典

person = {
    "name":"kesi",
    "first_name":"ma",
    "email":"123@163.com"
}
person
{'name': 'kesi', 'first_name': 'ma', 'email': '123@163.com'}
people = {
                "last":["ma","hu","ma","ma"],
                "first":["kevin","anna","david","obama"],
                "email":["1@163.com", "2@163.com","3@163.com","4@163.com"],
                "age":["33","34","5","3"],
                "salary":[10000,10000,10000,10000]
                }
people
{'last': ['ma', 'hu', 'ma', 'ma'],
 'first': ['kevin', 'anna', 'david', 'obama'],
 'email': ['1@163.com', '2@163.com', '3@163.com', '4@163.com'],
 'age': ['33', '34', '5', '3'],
 'salary': [10000, 10000, 10000, 10000]}
data = pd.DataFrame(people)
data
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
last first email age salary
0 ma kevin 1@163.com 33 10000
1 hu anna 2@163.com 34 10000
2 ma david 3@163.com 5 10000
3 ma obama 4@163.com 3 10000

通过列标题索引列

data["email"]
0    1@163.com
1    2@163.com
2    3@163.com
3    4@163.com
Name: email, dtype: object

列标题也是属性值

data.email
0    1@163.com
1    2@163.com
2    3@163.com
3    4@163.com
Name: email, dtype: object
#  每一列为DataFrame中的Series系列
type(data.email)
pandas.core.series.Series

增加列标签和数据

# 批量修改某一列的数量
data["salary"] = [10000,20000,40000,50000]
data["salary"]
0    10000
1    20000
2    40000
3    50000
Name: salary, dtype: int64
data[["last","first"]]
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
last first
0 ma kevin
1 hu anna
2 ma david
3 ma obama

查看列标签

data.columns
Index(['last', 'first', 'email', 'age', 'salary'], dtype='object')
data.iloc[0]
last             ma
first         kevin
email     1@163.com
age              33
salary        10000
Name: 0, dtype: object

lociloc

loc:works on labels in the index.通过行索引 “Index” 中的具体值来取行数据

iloc:works on the positions in the index (so it only takes integers).通过行号来取行数据(只能为整数)

不连续选区某两行

data.index
RangeIndex(start=0, stop=4, step=1)
data.columns
Index(['last', 'first', 'email', 'age', 'salary'], dtype='object')
data.iloc[[0, 2]]
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
last first email age salary
0 ma kevin 1@163.com 33 10000
2 ma david 3@163.com 5 40000
loc通过行index选区某一行
data.loc[2]
last             ma
first         david
email     3@163.com
age               5
salary        40000
Name: 2, dtype: object
data.loc[1:3:,["email","salary"]]
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
email salary
1 2@163.com 20000
2 3@163.com 40000
3 4@163.com 50000
data
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
last first email age salary
0 ma kevin 1@163.com 33 10000
1 hu anna 2@163.com 34 20000
2 ma david 3@163.com 5 40000
3 ma obama 4@163.com 3 50000

将某一列设置为行索引

data.set_index("email",inplace = True)
data
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
last first age salary
email
1@163.com ma kevin 33 10000
2@163.com hu anna 34 20000
3@163.com ma david 5 40000
4@163.com ma obama 3 50000
data.reset_index(inplace=True) #原地修改原数据集
data
.dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
email last first age salary
0 1@163.com ma kevin 33 10000
1 2@163.com hu anna 34 20000
2 3@163.com ma david 5 40000
3 4@163.com ma obama 3 50000