Yes, you can feed embedchain with data that comes from a kafka consumer. This way we ensure that the chatbot's vector database is fed exclusively and in quasi-real time with the information we want, maximizing efficiency and minimizing hallucinations. I have made a demo using quix and embedchain for rapid prototyping. quix provides access to…
How to create a custom pdf lanchain agent with faiss.txt
Straight to the point, this is the code to be able to upload a pdf, or many, to an llm agent so i can ask questions about the content of the file. I am saving the data in an instance that runs locally, so the limit is the computer's RAM. I try to calculate the…
How to create a langchain agent able to talk to a spark cluster.txt
Straight to the point, the code is so intuitive that I think it doesn't need much more. To say that I personally am very excited about this possibility, because the ability on the one hand to tell the spark cluster what to do, plus the possibility of having an agent with all the knowledge and…
How to create an agent designed to write and execute Python code to answer a question.txt
Recently I'm playing with langchain and openai's ability to generate code that can run on your machine, in your virtual environment. This is something that for now no AI wants, can or is allowed to do, something that has surprised me. I have taken the most basic code to learn about this capability that langchain…
How to build a documentation-helper with langchain and pinecone

I am still following a langchain course in Udemy, this is one of the exercises, really cool. I have modified it a bit because of some problems with dependencies. Basically it is a chat enabled to talk with your documentation. Ideally, you would have verified processes that are responsible for ingesting your documentation into the…
Top Network Security Cheatsheet.txt
Voy a seguir el guión maravilloso de Alex para hablar sobre los distintos problemas que podemos encontrarnos en las distintas capas TCP/IP y como podría tratar de solventarlas desde un punto de vista de un miembro del equipo blue team. Es un trabajo en progreso, esta va a ser una entrada larga. Comentar que, como…
Pruebas de explotacion.txt
La prueba de concepto de explotación (PoC) es una técnica utilizada en ciberseguridad para demostrar la vulnerabilidad de un sistema a través de la creación de un ejemplo (generalmente en forma de código malicioso) que explote la vulnerabilidad en cuestión.
Clean Code y spring-boot en Español
Este escrito está basado en el trabajo de Gozde Saygili Yalcin. Su trabajo original está genial y me gustaría añadir algunas cosas que creo son relevantes en los ejemplos, como la gestión personalizada de excepciones mediante aspectos, junto con la traducción al español. Muchas gracias Gozde. Voy a hablar sobre los principios SOLID poniendo énfasis…
How to run spark-3.x with Delta Lake, apache Hudi, Apache Iceberg.
Can i run a spark-shell with Delta lake, Apache Hudi and Apache Iceberg?
A proposal for a lambda architecture for modern real-time telecommunications
To obtain real consistency with Delta Lake, Hudi and Iceberg leaving behind Apache Impala and classic Spark.
Una propuesta de una arquitectura lambda para telecomunicaciones modernas en tiempo real
Una propuesta para tratar de dejar atrás Impala cuando necesitas consistencia de datos y la tratas de conseguir mediante software
Solid principles with spring-boot.
https://medium.com/@saygiligozde/applying-solid-principles-to-spring-boot-applications-191d7e50e1b3 Applying SOLID Principles to Spring Boot Applicationshttps://medium.com/@saygiligozde/applying-solid-principles-to-spring-boot-applications-191d7e50e1b3 This sample is so perfect that i just want to post it here to remember it forever. thanks to the original author, not me.
Acerca de Apache Druid y Apache Kafka.
Como empezar con Apache Druid y Apache Kafka.
Acerca de la entrega y procesamiento de mensajes exactly-once ordenado con Apache Kafka, parte 2.
entrega y procesamiento exactly-once ordenado con Apache Kafka.
Acerca de la entrega y procesamiento de mensajes exactly-once con Apache Kafka
Consejos acerca de como conseguir entrega y procesamiento exactly-once con Apache Kafka.
Acerca de la estructura de datos TreeMap, java y python
Una estructura de datos que puedes usar como caché debido a su complejidad optima y su capacidad multihilo. Incluyo ejemplos de uso junto con recomendaciones y varios benchmarks.
Acerca de las estructuras de datos Bloom filters.
Acerca de la estructura de datos Bloom Filter.
Algunas diferencias entre Apache Pulsar y Apache Kafka
Apache Pulsar and Apache Kafka are both distributed and scalable technologies, but with different focus areas - Pulsar primarily focuses on ordered messaging delivery, while Kafka on data streaming. Pulsar guarantees message delivery order by default, making it suitable for applications requiring this. Kafka, on the other hand, is excellent for delivering and consuming streaming information when order isn't crucial. Both technologies can be used in production, according to specific needs. Notably, Pulsar's security mechanism is also similar to Kafka's, using TLS protocol for encrypting and authenticating messages.
Initial simplified version of the calculation of the Lagrangian equation that tries to include all the interactions of the standard model of particle physics.
import sympy as sp # Define measured values m_W = 80.379 # W boson mass in GeV/c^2 m_Z = 91.1876 # Z boson mass in GeV/c^2 m_h = 125.1 # Higgs boson mass in GeV/c^2 m_e = 0.511e-3 # Electron mass in GeV/c^2 m_mu = 105.66e-3 # Muon mass in GeV/c^2 m_tau = 1776.86e-3 #…
First steps with Apache Spark 3.5.0 Delta Lake using scala.
https://docs.delta.io/latest/quick-start.html#create-a-table&language-scala first, install apache spark, i am osx user, so i will not recommend to use homebrew because it will not install third party libraries. I recommend to download from https://spark.apache.org Latest version is 3.5.0 at 28 nov 2023. Then, run spark-shell with delta lake support: ATTENTION, be sure about delta lake version, you must…
About my python learning and deeplearning. Reflections.
I am currently learning python and machine learning while I get a job offer where I can telecommute. For family and health reasons I can't afford the luxury of working outside my city, my autonomous community, my country, for me it is a luxury I can't afford, so I have to accept my reality and…
Training and building a neural network using Pytorch and MNIST dataset
The Fashion MNIST Datasets contain a set of 28x28 grayscale images of clotes. Our goal is building a neural network using Pytorch and then training the network to predict clothes. 84% max. First python Without REFACTOR. Third is refactored. This is the refactor code in a gist file I currently have an accuracy of 81%,…
𝗦𝗽𝗮𝗿𝗸 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹𝗶𝘀𝗺
more from spark and parallelism tips.
How to protect from infected ssh public keys
A proposal with a script to defend ourselves against modified ssh public keys.
Preguntas y respuestas sobre Spark en ChatGPT

Consejos sobre Apache Spark.
Some linux commands

100+ Linux commands that Linux Sysadmins regularly use, with explanation. Thank you @linuxopsys. 100+ Linux commands that Linux Sysadmins regularly use, with explanation cut : allows you to cut out sections of a specified file or piped data and print the result to standard output. sort : used to sort files uniq : used to…
About how to use GH actions to create a CI/CD pipeline with maven, docker, discord and snyk.
About how to use GH actions to create a CI/CD pipeline with maven, docker, discord and snyk.
Acerca de como usar GH actions para crear un pipeline CI/CD con maven, docker, discord y snyk.
pipelines CI/CD
Troubleshooting k8s, helm, etc
https://static.learnk8s.io/troubleshooting-kubernetes.aeb4a7d680aedf1c970e51ad6ad05548.pdf This is an internal post of mine, to remember things I have to finish, tabs open for a long time in Safari, things like that. https://kubernetes.io/es/docs/home/ https://learnk8s.io/templating-yaml-with-code Bonus track, some useful resources. https://java.by-comparison.com https://devresourc.es (Create cool posts) https://carbon.now.sh/tH8akpzZiWj8OrjC2Pzv (Daniel`scala course. Basic.) https://rockthejvm.com/p/scala-at-light-speed (Files from the Bret k8s course. Forked) https://github.com/alonsoir/udemy-docker-mastery/tree/main/k8s-yaml (how to install new…
A quick run of trivy scanner
About using code scanners to have a minimum of quality when deploying software in production.
Working with k8s services
working with k8s in a terminal launching commands. Creating services, deployments,...
About how to create a CI/CD environment using open source tools.
How to create a ci/cd pipeline using tools available to everyone. Ansible, Jenkins, Docker.
Acerca de algunas notas sobre arquitectura de microservicios
Mis notas sobre el curso de microservicios de Atomikos junto con un análisis. Estoy analizando los dos protocolos existentes a dia de hoy para crear una arquitectura de microservicios, Two Commit phase y Saga. Es un trabajo en progreso, mientras más aprenda y comprenda sobre este concepto, iré añadiendo más notas. Introducción Si ya es…
First steps with Kubernetes
I'm an osx user, so these instructions are intended for this system, in case I have to restart it. Once Docker.app is started, you enter in preferences, Kubernetes, click on initialize Kubernetes. After a minute or so, you will get the indicator that Kubernetes is up and running. By default the Dashboard does not start,…
Turbo-Charge DDoS Detection: Retraining Random Forests with High-Fidelity Synthetic Traffic
This work was generated using this link so i won't lose this research work. Back up. Retraining Random Forest (RF) models with high-quality synthetic data is a powerful and necessary strategy for creating robust Distributed Denial of Service (DDoS) detection systems [executive_summary[0]][1]. This approach directly confronts the core challenges of real-world network security data: extreme…
Development of Network Anomaly Detection Models Using Random Forest

🛡️ Development of Network Anomaly Detection Models Using Random Forest In our upgraded-happines project, we built a robust pipeline for training models to detect network anomalies and cyberattacks using Random Forest. Key techniques included: Integration and normalization of public datasets Aggregated multiple sources (CIC-IDS2017, MAWI, Stratosphere, USTC-TFC2016). Data cleaning: removal of duplicates, null values, and irrelevant columns. Feature normalization…
Sobre el IDS que estoy desarrollando
He estado todo el verano investigando, tanto en temas de domotica con python, arquitectura zero trust, software distribuido y big data con scala/spark/java, un proyecto de ciberseguridad en la que descubrí una, en mi opinión, muy grave falla en youtube y en los proveedores de video, pues permite literalmente la exfiltración de información confidencial de…
Arquitectura Zero Trust. Un punto de partida.
Resumen Ejecutivo En esta guía se presenta un esqueleto para un artículo de blog orientado a investigar en profundidad las siguientes áreas de seguridad y arquitectura modernas: Arquitectura Zero Trust, Content Security Policy (CSP) y Subresource Integrity (SRI), revocación dinámica de JWT, multifactor authentication (MFA) con FIDO2/WebAuthn y desacoplamiento de la UX del backend. Para…
Elevating Software Quality and Security in Production: Practical Steps for Responsible Development
A Starting Point: The Ideal WorkflowThis infographic shared by@midudev illustrates how companies should deploy code to production: a structured process that includes planning (Jira), development (Git, Jenkins), testing (QA environments), and deployment (Docker, Kubernetes). However, in reality, this workflow is often riddled with gaps. I’ve worked in multiple organizations where quality and security issues in…
Elevar la calidad y seguridad del software en producción: Pasos prácticos para un desarrollo responsable
Un punto de partida: El flujo idealEsta infografía compartida por @midudev ilustra cómo las empresas deberían desplegar código a producción: un proceso estructurado que incluye planificación (Jira), desarrollo (Git, Jenkins), pruebas (QA, entornos) y despliegue (Docker, Kubernetes). Sin embargo, en la realidad, este flujo está lleno de agujeros. He trabajado en múltiples organizaciones donde los…
El apagón en España el día de mi cumpleaños.
Pues si, el día de mi cumpleaños se fue la luz, que le vamos a hacer. Está va a ser la única entrada que haré en este blog que no tenga que ver con tecnología informática, aunque algo hay en ello.El CNI ha descartado que haya sido algún tipo de ataque, por lo que, qué…
Arquitectura JVM, edición saber y ganar.
Aquí tienes un análisis exhaustivo sobre la arquitectura de la JVM y su funcionamiento. Comprendiendo la Arquitectura de la JVM A pesar de la enorme cantidad de material formativo sobre Java, es sorprendentemente difícil encontrar información de calidad sobre su arquitectura interna. Sin embargo, para quienes buscan comprender el lenguaje en profundidad, es imprescindible conocer…
Risks of Insecure Deserialization in Java and Mitigation Measures
Recently, I encountered the issue of insecure deserialization in Java, a risk that has been present since the early days of ObjectInputStream. This vulnerability allows an attacker to intercept a serialized object (Serializable), modify it using tools like Burp Suite, decode it from Base64, and reinject it with malicious code using utilities like ysoserial. The…