What does Big Data do in MegaFon and how to get there?

MegaFon is not just a telecom company that provides mobile communications, it is a digital company that creates products that form an ecosystem for a customer’s life: “Own card”, “Own cashback”, “MegaFon.TV”, “MegaFon.Music” and many others. The MegaFon Big Data Analytics Department personalizes offers to the needs of each client.

image
MegaFon Big Data Analyst Speech at the Data Fest Conference in Spring 2019

MegaFon datacientists are solving the problem of preserving the subscriber base, which is one of the priorities for the company amid a slower growth in the telecom services market. For example, a few years ago, based on big data, a new “Turn on” tariff line was developed. It is built on the real interests of digital users: talking, chatting in messengers, listening to music, chatting on social networks, watching videos. The names of tariffs correspond to filling according to interests, and unlimited use of familiar applications does not require calculations of consumed traffic. When forming an ecosystem, our task is to make an individual offer to each client.

Big Data also solves the problems associated with retail. For example, with the help of machine learning models, we understand where to move inefficient salons and where to open new ones. Work with geodata helps us in this direction.

Big data analytics is also used in tasks related to the development of network infrastructure, where, using the analysis of towers and traffic from them, we determine the optimal coverage and predict promising locations for construction.

What technologies are used?

The amount of data we work with is millions of subscribers and billions of daily records for them. Big Data is not just databases such as Oracle, MySQL or MongoDB. Big Data is a whole range of software for working with them. To work with big data, you need to understand how Hadoop works, to know the features of working with Spark, Hive, HDFS. Often, data analysts who come to us have not previously used these tools in their work. In this case, we teach those skills that are not enough.

The skills of working with big data are acquired with experience, therefore MegaFon is interested in talented analysts who are ready to learn all the necessary tools and apply them to the real tasks of the company.

image
BigDataCamp in the office of MegaFon, 2019

How do experts at MegaFon's Big Data develop models?

MegaFon's Big Data experts are divided into analysts (datasaentists) and engineers. Analysts test hypotheses and build machine learning models. Engineers help analysts collect storefronts, optimize ETL processes, and are responsible for setting up models in production.

The development of the model is as follows. First we collect the necessary data in Hadoop or Oracle. Then the model is trained on dedicated servers with a large amount of memory and CPU cores. For training neural networks, we use servers with GPUs.

image
BigDataCamp in the office of MegaFon, 2019

The main language for developing models is Python. To process data in Python, the standard libraries Pandas, NamPy, Scikit-learn are usually required. For calculations in Hadoop PySpark and Hive are used, for modeling - libraries Scikit-learn, Xgboost, LightGBM, PyTorch and others. The list depends on the task. Why Python? Its main advantage is the simplicity of productivity. We can make a decision that will be immediately integrated into the common infrastructure. Although it happens that the required libraries are not in Python, they are in other languages. For example, R has statistics libraries that are not in Python.

What if no one knows Hadoop?

Hadoop skills are desirable, but they are not a prerequisite to get to our team. Not all companies have the amount of data that MegaFon has, and as a result, candidates did not have the opportunity to work with Hadoop at their previous place of work.

It is not very difficult to master the basic commands for working with the Hadoop cluster, but when it comes to more complex tasks, a deep understanding of big data algorithms, MapReduce and query optimization methods is required. For example, in the Hadoop ecosystem there is such a product as Hive. It allows you to write SQL-like queries and runs on top of Hadoop. It was originally developed by Facebook. But you need to remember that this is not manipulating a relational database, despite the fact that you are writing in SQL. Here you can write simple queries, but in order to achieve efficiency, that is, speed and minimal use of cluster resources, you should understand the nuances of query optimization using MapReduce.

Internships are an opportunity to develop and gain business experience. Are there internships in
Big Data MegaFon?

In our digital world, it seems that already any stool collects data about the person who is sitting on it, not to mention the Internet of things and the large number of services that we all use.

The need for specialists is growing, there are a large number of analyzes and forecasts about how many will be needed in the near future. Every company that collects at least some data understands that this data can have value and a large number of insights. Therefore, data analysts are now in such demand.

image
BigDataCamp in the office of MegaFon, 2019

We are glad to have great specialists, but the market is small, and there are not many suitable for us. Therefore, MegaFon is developing internship programs. Basically, we invite senior students and recent graduates who are involved in programming and mathematics to do internships. There are exceptions, for example, there was a successful experience in interacting with guys from geography departments. It is important for us that the student can harmoniously combine work with study, develop further in the company and in the future move to the position of analyst or engineer.

How do you type in a team?

Our interviews with interns are different from interviews with experienced professionals. When searching for interns, the recruiter conducts a small telephone interview, the results of which make it clear whether the candidate is interested in our tasks and what level of knowledge and experience he currently has. It is important for us whether the candidate is able to program in Python, whether he knows the basic libraries of machine learning, whether he has experience in solving training problems related to the analysis of big data, whether he has previously built mathematical models and what algorithms he used.

Based on the results of a telephone interview, we select 5-10 candidates who simultaneously come to our office for 2-3 hours to get to know the guys from the team and solve the technical task. It is as close as possible to the telecom industry - it is necessary to build a model for classifying our subscribers. Next, we compare the results and invite the best to the final interview to discuss an individual work schedule, tasks and other conditions.

The internship lasts 3 months. The intern is engaged in real business tasks. Most often, tasks are already formalized, and a person has a clear understanding of what needs to be done, if not, you can always turn to your mentor .

In addition to business tasks, our interns regularly undergo offline and online training. We work with New Pro Lab, Big Data Team, Geek Brains, Data Gym and others, our experts have access to Coursera.

As practice shows, three months is enough to understand whether we want to continue working together. If the intern shows good results, we take him to the position of junior data scientist and develop further.

image
Egor, MegaFon Big Data Analyst, at the Data Fest conference in spring 2019.

The search for experienced professionals is as follows:

1. Double-chek resume or candidate profile with team leaders and recruiter.

2. Personal interview with team leader, where there are technical and not only questions: probability theory, statistics, machine learning, experience in using different utilities, expectations of the candidate himself.

3. If the interview went well for both parties, we ask for the candidate's portfolio (personal projects and code) or ask us to solve our technical task in order to see the code and find out the progress of solving problems. The technical task is also associated with telecom: it is necessary to predict whether the subscriber has several SIM cards. The term of the assignment is determined by the candidate, but usually it is no more than a week. One of our employees solved the task that evening and a week later came to work for us. Hi Artem ;)

4. Meeting with the director of big data analytics, discussion of tasks and conditions.

Is bureaucracy strong in a large corporation?

Most of our team works in the head office in Moscow, but we have teams in Nizhny Novgorod and Yekaterinburg. Colleagues from different cities can be involved in projects, it all depends on the tasks and skills of employees.

Our department is young, dynamic, and we initially managed to correctly build processes for interacting with other departments: we do not need to request data through colleagues, we mainly use our database, Oracle or Hadoop, and build a model.

image
Work in the office of MegaFon

Our workflow is organized as follows. First, the manager discusses the requirements with a customer representative. As a rule, we are talking about improving a business process using machine learning and data analysis, for example, we can optimize the sale of smartphones for our retail. Then the manager, team leader and analyst jointly discuss the terms and stages of development. Arrangements are recorded in Jira, we also run Confluence, this is our internal Wiki. Of course, we use Gitlab.

This year we introduced the code review process for all the key stages of the data science project and already see the results: the code quality of many guys has improved significantly. Further plans to improve the development process are the implementation of the DVC (Data Version Control) tool, which will allow versioning the entire project, including datasets.

Duration of projects can be from several months to six months. The analyst is involved at all stages of the project, from formalizing requirements and determining the target event of the model, ending with monitoring the stability of the result in production.

We are very result-oriented, we never undertake development without a clear understanding of what benefits we can bring to MegaFon.
After building the model, we launch test campaigns based on the results of its work. If successful, we roll out our solution to millions of MegaFon subscribers. In the future, we analyze the results not only from the point of view of model metrics, such as accuracy or completeness on the target segment, but also seriously approach the analysis of business indicators. Our business analysts help us with this.

Team and Development

The biggest plus of work in this department is a team of really smart guys and interesting tasks. The office, the shopping center in it, bonuses, compensation, of course, are also good, but it is in third place. MegaFon for analysts is a real storehouse of data. Not everyone has the opportunity to work with such a type and amount of data that when they are analyzed, you can catch insights and make decisions that will ultimately bring a lot of money. This is the most interesting for the analyst. You studied at the university, wrote a new algorithm, coded it, applied scientific methods, the algorithm began to work and really bring some benefit. This is what causes the most emotions.

We are people of numbers, surrounded by people of commerce, and when our insights lead to making money - it's great!

The interview was prepared jointly with the My Circle career service.

Source: https://habr.com/ru/post/479384/


All Articles