とても良く使う、行方向(縦方向)のデータフィルタについて触れます。
今回は、数値の列を対象としたデータフィルタです。

元データ:data1

In [8]: data1.head()
Out[8]:
   age  workclass  fnlwgt     education  educational-num      marital-status  \
0   25    Private  226802          11th                7       Never-married
1   38    Private   89814       HS-grad                9  Married-civ-spouse
2   28  Local-gov  336951    Assoc-acdm               12  Married-civ-spouse
3   44    Private  160323  Some-college               10  Married-civ-spouse
4   18          ?  103497  Some-college               10       Never-married

          occupation relationship   race  gender  capital-gain  capital-loss  \
0  Machine-op-inspct    Own-child  Black    Male             0             0
1    Farming-fishing      Husband  White    Male             0             0
2    Protective-serv      Husband  White    Male             0             0
3  Machine-op-inspct      Husband  Black    Male          7688             0
4                  ?    Own-child  White  Female             0             0

   hours-per-week native-country income
0              40  United-States  <=50K
1              50  United-States  <=50K 2 40 United-States >50K
3              40  United-States   >50K
4              30  United-States  <=50K

前処理

一部の列名内に「-」が入っているため、置換します。(マイナスの演算子と認識され、エラーがでます。)
※詳細はこちら

data1.columns = data1.columns.str.replace("-", "_")

基本的な使い方

query()の中に、文字列でフィルタ条件を指定します。

例1

# ageが20以上に絞る
data1.query("age >= 20").head()

例1の結果

   age  workclass  fnlwgt     education  educational_num      marital_status  \
0   25    Private  226802          11th                7       Never-married
1   38    Private   89814       HS-grad                9  Married-civ-spouse
2   28  Local-gov  336951    Assoc-acdm               12  Married-civ-spouse
3   44    Private  160323  Some-college               10  Married-civ-spouse
5   34    Private  198693          10th                6       Never-married

          occupation   relationship   race gender  capital_gain  capital_loss  \
0  Machine-op-inspct      Own-child  Black   Male             0             0
1    Farming-fishing        Husband  White   Male             0             0
2    Protective-serv        Husband  White   Male             0             0
3  Machine-op-inspct        Husband  Black   Male          7688             0
5      Other-service  Not-in-family  White   Male             0             0

   hours_per_week native_country income
0              40  United-States  <=50K
1              50  United-States  <=50K 2 40 United-States >50K
3              40  United-States   >50K
5              30  United-States  <=50K

例2

# ageが20に一致するものに絞る
data1.query("age == 20").head()

例2の結果

    age    workclass  fnlwgt     education  educational_num marital_status  \
16   20    State-gov  444554  Some-college               10  Never-married
39   20      Private  257509       HS-grad                9  Never-married
44   20    State-gov  138371  Some-college               10  Never-married
79   20      Private   72055  Some-college               10  Never-married
86   20  Federal-gov  244689          11th                7  Never-married

         occupation   relationship   race  gender  capital_gain  capital_loss  \
16    Other-service      Own-child  White    Male             0             0
39     Craft-repair      Own-child  White    Male             0             0
44  Farming-fishing      Own-child  White    Male             0             0
79     Adm-clerical  Not-in-family  White  Female             0             0
86    Other-service      Own-child  White  Female             0             0

    hours_per_week native_country income
16              25  United-States  <=50K
39              40  United-States  <=50K
44              32  United-States  <=50K
79              40  United-States  <=50K
86              10  United-States  <=50K

例3

# ageが20に一致しないものに絞る
data1.query("age != 20").head()

例3の結果

   age  workclass  fnlwgt     education  educational_num      marital_status  \
0   25    Private  226802          11th                7       Never-married
1   38    Private   89814       HS-grad                9  Married-civ-spouse
2   28  Local-gov  336951    Assoc-acdm               12  Married-civ-spouse
3   44    Private  160323  Some-college               10  Married-civ-spouse
4   18          ?  103497  Some-college               10       Never-married

          occupation relationship   race  gender  capital_gain  capital_loss  \
0  Machine-op-inspct    Own-child  Black    Male             0             0
1    Farming-fishing      Husband  White    Male             0             0
2    Protective-serv      Husband  White    Male             0             0
3  Machine-op-inspct      Husband  Black    Male          7688             0
4                  ?    Own-child  White  Female             0             0

   hours_per_week native_country income
0              40  United-States  <=50K
1              50  United-States  <=50K 2 40 United-States >50K
3              40  United-States   >50K
4              30  United-States  <=50K

複数条件でのフィルタ(範囲指定)

前者でもできますが、後者の方が記述量が少なく、おすすめです。

例4

# ageが20以上、30以下に絞る(次の2つのコードは同じ結果を返す)
data1.query("age >= 20 & age <= 30").head()
data1.query("20 <=  age <= 30").head()

例4の結果

    age  workclass  fnlwgt     education  educational_num      marital_status  \
0    25    Private  226802          11th                7       Never-married
2    28  Local-gov  336951    Assoc-acdm               12  Married-civ-spouse
6    29          ?  227026       HS-grad                9       Never-married
8    24    Private  369667  Some-college               10       Never-married
12   26    Private   82091       HS-grad                9       Never-married

           occupation   relationship   race  gender  capital_gain  \
0   Machine-op-inspct      Own-child  Black    Male             0
2     Protective-serv        Husband  White    Male             0
6                   ?      Unmarried  Black    Male             0
8       Other-service      Unmarried  White  Female             0
12       Adm-clerical  Not-in-family  White  Female             0

    capital_loss  hours_per_week native_country income
0              0              40  United-States  <=50K 2 0 40 United-States >50K
6              0              40  United-States  <=50K
8              0              40  United-States  <=50K
12             0              39  United-States  <=50K

複数条件でのフィルタ(OR)

例5

# ageが20以下、または、30以上に絞る
data1.query("age <= 20 | age >= 30").head()

例5の結果

   age         workclass  fnlwgt     education  educational_num  \
1   38           Private   89814       HS-grad                9
3   44           Private  160323  Some-college               10
4   18                 ?  103497  Some-college               10
5   34           Private  198693          10th                6
7   63  Self-emp-not-inc  104626   Prof-school               15

       marital_status         occupation   relationship   race  gender  \
1  Married-civ-spouse    Farming-fishing        Husband  White    Male
3  Married-civ-spouse  Machine-op-inspct        Husband  Black    Male
4       Never-married                  ?      Own-child  White  Female
5       Never-married      Other-service  Not-in-family  White    Male
7  Married-civ-spouse     Prof-specialty        Husband  White    Male

   capital_gain  capital_loss  hours_per_week native_country income
1             0             0              50  United-States  <=50K 3 7688 0 40 United-States >50K
4             0             0              30  United-States  <=50K
5             0             0              30  United-States  <=50K 7 3103 0 32 United-States >50K

数値列間の比較

範囲指定の例を適用すると、後者のようなこともできます。

例6

# lossよりgainの方が多いデータに絞る
data1.query("capital_gain < capital_loss").head()

例6の結果

     age     workclass  fnlwgt     education  educational_num  \
52    21       Private  214399  Some-college               10
82    24       Private   83141  Some-college               10
129   41       Private  239296       Masters               14
143   43  Self-emp-inc  214503       HS-grad                9
154   40     State-gov   67874     Bachelors               13

         marital_status       occupation   relationship   race  gender  \
52        Never-married    Other-service      Own-child  White  Female
82            Separated    Other-service  Not-in-family  White    Male
129  Married-civ-spouse   Prof-specialty        Husband  White    Male
143  Married-civ-spouse  Exec-managerial        Husband  White    Male
154  Married-civ-spouse     Adm-clerical        Husband  White    Male

     capital_gain  capital_loss  hours_per_week native_country income
52              0          1721              24  United-States  <=50K
82              0          1876              40  United-States  <=50K 129 0 2415 50 United-States >50K
143             0          1887              45  United-States   >50K
154             0          1887              45  United-States   >50K

例7

# lossよりgainの方が多く、lossが1000以下のデータに絞る
data1.query("capital_gain < capital_loss <= 1000").head()

例7の結果

      age         workclass  fnlwgt     education  educational_num  \
158    42           Private  113732  Some-college               10
1269   44           Private   67874  Some-college               10
3057   33           Private  164864     Bachelors               13
7005   30  Self-emp-not-inc  164461          11th                7
9389   42           Private  167357  Some-college               10

     marital_status         occupation relationship   race  gender  \
158   Never-married  Handlers-cleaners    Unmarried  Black  Female
1269       Divorced              Sales    Unmarried  White    Male
3057       Divorced     Prof-specialty    Unmarried  White  Female
7005       Divorced              Sales    Unmarried  White    Male
9389       Divorced       Adm-clerical    Unmarried  White  Female

      capital_gain  capital_loss  hours_per_week native_country income
158              0           625              40  United-States  <=50K
1269             0           625              50  United-States  <=50K
3057             0           323              40  United-States  <=50K
7005             0           653              40  United-States  <=50K
9389             0           213              40  United-States  <=50K