Three Pandas Methods You May Not Know About

The author of the article, the translation of which we are publishing today, says that when he found out about the methods of the Pandas library that he wants to talk about here, he felt like a completely incompetent programmer. Why? The fact is that when he wrote the code before, he was too lazy to look into the search engine and find out if there were some effective ways to solve some problems. As a result, he did not even know about the existence of a number of very useful Pandas methods. Without using these methods, he was nevertheless able to implement the necessary logic, but this required him several hours of work, which made him nervous. And of course, along the way, he wrote a bunch of unnecessary code. He prepared this article for those who would not want to be in his situation.



Idxmin () and idxmax () methods


I already wrote about the idxmin() and idxmax() methods, but if I don’t talk about them here, it will not be easy for you to understand what we will discuss next.

These methods, if you describe them in a nutshell, return the index (pointer position) for the desired record. Suppose I created the following Series Pandas object.

  x = pd.Series ([ 1, 3, 2, 8, 124, 4, 2, 1 ]) 

I need to find the index of the minimum and maximum element. Of course, it is not difficult to find out just by looking at the description of the object, but in real projects never (exactly like that - “never”) do not meet data sets consisting of such a small number of elements.

What to do? Use the idxmin() and idxmax() methods. Here's what it looks like:



Using these methods, you must not forget that they return the index of the first occurrence of the minimum or maximum value.

Ne () method


The ne() method was a huge discovery for me. Some time ago, I was working with data time series and ran into a problem when the first n observations were 0.

Imagine that you bought something, but did not consume it for a certain period of time. What you bought is at your disposal, but since you are not using it, the level of consumption of this at a certain date is 0. Since I was interested in the data on consumption obtained when the actual use of what started “ bought ”, it turned out that the ne() method was exactly what I needed.

Consider the following scenario. We have a Pandas DataFrame object, which contains the results of some observations, which, at the top of the list, are represented by values ​​of 0.
 df = pd.DataFrame() 



The ne() method will return True only if the current value is not the one specified when this method was called (for example, when it is called, the value 0 can be specified), otherwise it will return False :
 df['X'].ne(0) 


This method alone cannot be called particularly useful. Now remember how at the beginning of the article I said that to understand the article you need to familiarize yourself with the idxmax() method. I was not joking then. You can attach the idxmax() call to the above ne() call. The result is the following:
 df['X'].ne(0).idxmax() 


This tells us that the first non-zero observation result is in position 6. Again, this may not seem like such an important find. But the most important thing here is that this information can be used to select a subset of the DataFrame object and to display only those values ​​that appear starting from the detected position:
 df.loc[df['X'].ne(0).idxmax():] 


This technique is very useful in many situations in which you have to work with time series data.

Nsmallest () and nlargest () methods


I suspect that only after seeing the names of these methods, you can guess their purpose. Suppose I created the following DataFrame :
 df = pd/DataFrame({ 'Name': ['Bob', 'Mark', 'Steph', 'Jess', 'Becky'], 'Points': [55, 98, 46, 77, 81] }) 


To make it more interesting, suppose that here are the results of a test that some students passed. We want to find three students who did the worst with the test:
 df.nsmallest(3, 'Points') 


Or - find out who is in the top three:
 df.nlargest(3, 'Points') 


These methods are very good substitutes for methods like sort_values() .

Summary


Here we looked at some useful Pandas methods. For those who know about them, their application may seem completely natural, but for those who have just learned about them, they may look like a real find. We hope they serve you well.

Dear readers! Know of some useful Pandas methods that others may not know about?


Source: https://habr.com/ru/post/479276/


All Articles