a SoftwareMill Group Company
Knowledge sharing is a big part of our culture, and we’re deeply rooted in the tech community. That’s always been true for SoftwareMill, our founder company, and we’re happy to cultivate these values now that we’ve embarked on a journey as a standalone company. That’s why we were very happy to have the opportunity to join a technical conference as a sponsor - and that was our first time ever!
Late June, we saved our slides, packed our bags, and hit the road to attend Data Science Summit ML Edition, a hybrid (online & onsite) tech conference dedicated to ML learning practitioners at various levels of experience.
Overall, we think the conference was a great experience: our team found some interesting talks to attend, we met so many wonderful participants, and had our team perform their talks, too. Here’s how our team found the event:
All the conversations I've had! I'm more of a stick-to-my-ML-cellar (that's what I call my office) type so I was a bit reluctant to leave my burrow and leave my machine learning models alone. But showing up at DSS ML was worth it. I met amazing people who share my passion for machine learning. It was fun!
- says our CSO, Maciej Adamiak
DataSummit 2022 gathered many great researchers, like Tomasz Szczepański, an AI Researcher and a PhD student at Sano Centre for Computational Personalized Medicine. I really enjoyed his talk about data bias, which is critical to the development of deep learning models for medical imaging. In his daily research, he focuses a lot on explaining decisions made by deep learning models. As he says, it is extremely important to carefully design and prepare a dataset. However, it may happen that even a carefully developed dataset is highly biased. For example, in his speech, he presented a case of COVID-19 diagnosis using chest X-Rays. A huge X-Rays dataset of patients with COVID-19 and without such symptoms was collected. At first glance, the dataset looked well-designed. However, after training the model and using explainable AI, it turned out that there was a strong correlation between cables and electrodes used to help patients with severe course of disease and the presence of COVID-19. Thus, the model is basing its predictions on electrodes rather than lung images. For that reason, when developing a model, it is extremely useful to use explainable AI techniques, to validate which features a model uses to predict the output. It may happen that the dataset is highly biased and the model is performing well while making its decisions based on the wrong features. Tomasz Szczepański in his talk also presents pre-processing steps and model architectures that can be used to successfully train a model even on a highly biased dataset.
- Kamil Rzechowski, our Senior ML Engineer, shares his thoughts.
Data Science Summit ML edition was an intense day with many interesting talks & meetings with attendees. From the business perspective, I found the presentation of Wit Jukuczun from f33.ai to be spot-on: Wit showed that when working with a client, the 'how' isn't the most important (how to deliver a project), but the 'why' is essential - meaning taking into account the business aspects of the project and focusing on what benefits ML adoption will bring into the client's organization.
Another interesting talk was Piotr Rybak's "Classifying LEGO Bricks with ML". It was a pleasure to hear this talk, enthusiastic about the use of computer vision to classify LEGO. And even though it was an 'entertaining' topic, the project was very complex. I'm impressed with the author's idea and implementation!
- says Marcin Głasek, Senior Business Development Manager at SoftwareMill (our founder company), who supported us at our ReasonField Lab booth.
As you can see - we did enjoy it 😉 And since we went there for the knowledge and, more importantly, the people, we made sure to prepare some attractions and gifts for our stand visitors.
For us, our work is also our passion and we’re having fun doing it (well, at least most of the time ;) ). Our engineers have that spark in their eye when they talk about machine learning and our research & development projects - and that could be seen in every conversation we’ve had at Data Science Summit. But just to make the visit to our booth even more interesting, we prepared some attractions.
Started by our founder company, SoftwareMill, the Wheel of Fortune has become one of our greatest hits at all tech conferences. The wheel was built by SoftwareMill’s engineers and now has sets of questions related to Java, Scala, our company, and machine learning. We’ve built a base of ML-focused questions on various levels of difficulty. To play, all you have to do is spin the wheel and wait for your question to appear on the computer screen.
To win, you have to answer a number of questions in a row, which leads us to…
Having played the Wheel of ML Fortune (which is already fun on its own, without prizes ;) ), our guests collected some cool prizes. What did they win?
Tapir is a cute exotic animal, and a mascot of SoftwareMill - since it’s also the company’s open source project that has recently celebrated the release of its stable version. With tapir, you can describe HTTP API endpoints as immutable Scala values. Each endpoint can contain multiple input and output parameters. An endpoint specification can be interpreted as a server, a client or documentation.
You can read more about Tapir here:
This giant mug is a must have in every coffee & tea fan’s kitchen. Or by their laptop. It holds up to half a liter of that precious, life-giving, awakening liquid - so it’s ideal to kick off every workday. And it’s stylish as can be - decorated with SoftwareMill’s ‘Type safe, party hard’ Party Parrot! 🦜
As I already mentioned, knowledge sharing is a big part of our culture. Even though we’re now a company of our own, we stay close with the SoftwareMill team, working on the same projects, meeting at company getaways, and learning together. To do the latter, we organize reading clubs and internal workshops to talk about battle-tested solutions to software development challenges and exchange thoughts to inspire each other to keep learning.
This time, our engineers selected 2 popular titles as our Wheel of ML Fortune prizes:
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppman is a comprehensive guide addressing the issues of modern system design. It covers topics related to system scalability, consistency, reliability, efficiency, and maintainability.
Why did we choose it? It’s an important read not only for machine learning engineers, but also for anyone whose work touches upon big data. As Bartłomiej Żyliński, Scala Software Engineer from SoftwareMill says:
In my opinion, Designing Data-Intensive Applications by Martin Kleppmann is a must-read for anyone who works with distributed systems or so called “Big Data”. It is a great walkthrough about most of the currently known concepts like replications or partition. The author describes all the concepts in-depth, lists different approaches to implementing them, and shows their advantages and disadvantages. The book also contains quite a thorough description of problems we may encounter in distributed systems like transactions or consensus. Inside, you can also find chapters describing Batch and Streaming processing, so if you are looking for a good comparison, you can find it there. As a side note, I would like to add that the author started to work on the 2nd edition, so probably in a year or two, we will have a revised version of the book. As I said in the beginning, it is a great book and I can honestly recommend it to everyone.
The other book is Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps by Valliappa Lakshmanan, Sara Robinson, and Michael Munn. This position aims to cover the recurring problem tied to machine learning and provides 30 patterns for data and problem representation, operationalization, repeatability, reproducibility, flexibility, explainability, and fairness. Quite a must-read for ML practitioners!
Our little stand was overflowing with these and more gadgets and prizes and we were happy to see that it attracted quite a crowd. But our booth wasn’t the only way we participated in the conference since we also prepared 2 presentations: an onsite one delivered by Kamil Rzechowski, and a VoD talk by Adam Kaczmarek - our Senior ML Engineers.
In the picture up there, you can see Kamil, our Senior ML Engineer on the stage with his talk about AI-powered shops. Here’s what Kamil says about his experience as a speaker:
It was a great honor and a pleasure to present at the Data Science Summit ML edition in Warsaw 2022 on behalf of ReasonField Lab. I gave a talk about state-of-the-art technologies in autonomous stores. The speech covered algorithms used to track and charge customers and discussed solutions to common issues with cashier-less stores. It was a remarkable experience to see an audience full of people and many questions at the end.
And some more details on the talk:
Cashier-free shops like Amazon Go, Morrisons, Tesco, or Żappka improve customer experience, and reduce running costs and business downtime to zero by automatizing checkout and providing 24/7 shopping time. Doing shopping is simplified to entering the store, taking what you want from shelves, and walking out. The checkout is done automatically based on the cameras and AI-Powered algorithms that track your every move and monitor what you took out with you. Those shops are already live, but with limited functionality or in the late stage of the experimental phase.
In this presentation, Kamil covered the currently used solutions and technology behind the cashier-free shops, presented a case study and algorithms overview used to track and charge customers, and discussed the challenges of cashier-free shops and how to deal with them.
This wasn’t the only talk that we had, though. Adam Kaczmarek, our Senior ML Engineer, gave an online talk for DSS ML edition.
Wordle is an online game of guessing 5-letter words inspired by Mastermind, which has recently gained quite a big popularity. In this talk, Adam shows how to write a solver that is trained with Reinforcement Learning and how to expand it to other n-wordle variants like quordle (4-wordle), octordle (8-wordle) etc. Adam also shares his observations on the impact of incorporating conditional character-level language models. Additionally, he covers some of the MLOps tools used for the project.
You can watch the VoD talk on the Data Science Summit ML edition platform.
It was amazing to join a tech crowd as ReasonField Lab for the very first time. We were proud to represent our company, happy to chat about our projects, and excited to see a full room of people at Kamil’s talk. This may have been a first - but it’s not the last of our outings. And we hope to see you again soon!
Want to meet us while we’re out & about? Stay up to day with our social media channels to see where we’re going next 😉