I haven't limited max zones in an oscillation as that throws up a bunch of further business logic questions. Here's a solution using a window function with Pandas UDF to assign an oscillation id to each row after grouping by person. I accept both scala and python (pyspark) solutions. If there is no oscillation, set it to NULL or set it to 0Įxample DATA to copy/paste: Zone, time, person, Oscillation_IDīecause I am working with billions of records I would need an efficient solution. I would like to give to create a column that tracks the oscillation.įor a given user, if he is oscillating, give the rows the same oscillating ID. The user is oscillation between the 3 zones.Īs for the previous example, we consider this movement as an oscillation because the user visits and Z1 and Z3 more than one time eventhough he only visited Z2 only once.įor ease of computation we can set the number of zones max that a user oscillates in to 5 zones. The user U1 goes from Z1 to Z3 then to Z2, Z3 and Z1, respectively at time t1,t2, t3,t4 t5. The user visits Z1 more than one Time eventhough he visited Z2 only once. ![]() This is considered as an oscillation: the user U1 goes from Z1 to Z2 and then to Z1. This is what I call a user « OSCILLATING ». ![]() A user U1 moves through the zones Z1, Z2, Z3 at time t1, t2, t3Ī user U1 goes back and forth through the zones Z1, Z2 at t1, t2,t3, t4
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |